2020 11-November

Spreadsheet munging strategies

Samuel Oranyeli has a great blog series on munging spreadsheets. Lots of practical advice for those who are interested in practical wrangling of messy spreadsheet data that come to us from collaborators.

Build a project portfolio

Eugene Yan writes extensively on the topic of data science careers, and I particularly enjoyed the essay he wrote titled Why Have a Data Science Portfolio and What It Shows.

A tl;dr summary of what he has in there:

  • The "why" is more than "getting a job". The process of building the portfolio matters more.
  • The process involves developing qualities: persistence, continual learning, altruism (to help others).
  • The "what it shows" includes both technical skills and those aforementioned qualities.

And a notable quote:

IMHO, traits and skills are a prerequisite to building a great portfolio. And they reinforce each other.

Also:

A portfolio is just an artifact of our skills, traits, and working process. It’s the destination; it’ll take care of itself if we focus on the journey.

Reflects very much the story behind The Score Takes Care Of Itself, by the legendary NFL coach Bill Walsh.

Apoorva's blog entry on functional programming in R

Apoorva's blog post on functional programming in R touches on some really important pointers on why, as data scientists, we might want to level-up our programming skills. Here's some highlights:

However, as you take on increasingly complex projects, you may find yourself thinking about more and more about structuring your project well and writing code that is easy to understand, debug, reuse, and maintain.

And:

By applying some basic concepts of functional programming, we gain:

  • better maintainability of the code base;
  • safer and reliable code;
  • the ability to manage complexity with abstractions that are borderline wizardry.

Data Science Programming Newsletter MOC

With the Data Science Programming newsletter, I'm trying to share ideas on how to make

Key information

Protocol

  1. On last week of the month, draft newsletter.
  2. On every first Monday of the month, send out the newsletter.
  3. Cross-post to essays collection.

Newsletters

2020

2021

Git scraping

Simon Willison, one of my heroes for giving us Datasette, writes about git scraping on his blog. The idea is to be able to track public sources of data over time, which sometimes is more interesting than each snapshot individually.

An example of git scraping in action is a tracker for the battleground states in this year's US Presidential elections. The website is found on GitHub pages, while the repo can be found on GitHub.

The myths and traps of managing up

Comes from the LeadDev.

A good overview of the myths about "managing up". The "why" behind the "whats" is to be an individual who helps foster a stable and productive environment for their colleagues/reports.

Quotables:

Forget about the notion that it’s “not on you” to accommodate a supposedly more skillful boss. They have different skills from you. They may even be less skilled than you. Your moral outrage at the situation is not useful. If you have a boss, managing up is part of your job.

The lead who is not great at managing up is also less able to sponsor others, a less useful ally to their own team, and their team has to contend with a harsher broader environment.

There's good case study stories in there, quite informative!