Eric J Ma's Website

How I Work

written by Eric J. Ma on 2019-03-20 | tags: data science productivity


I was inspired to write this because of Will Wolf’s interview with DeepLearning.AI, in which I found a ton of similarities between how both of us work. As such, I thought I’d write down what I use at work to get things done.

Tooling

For a data scientist, I think tooling is of very high importance: mastery over our tools keeps us productive. Here’s a sampling of what I use at work:

  • Compute: I have my own MacBook, but I prefer freeloading off my colleague’s workstation, which is connected to our HPC compute cluster, allowing me to do parallelization with Dask!
  • Editors/IDEs: VSCode + Jupyter Lab (JLab). Lots of plugins for VSCode!
  • Terminal: iTerm, with my dotfiles providing a high degree of customization. I also use the VSCode and JLab terminals where convenient.
  • General Purpose: Python, Dask, git
  • ML/Stats: scikit-learn, jax, pymc3
  • Data wrangling: pandas, pyjanitor (a package I wrote to provide convenience APIs for data cleaning)
  • Data visualization: matplotlib, seaborn, holoviews
  • App development: flask

As you probably can see, I’m a very Python-centric person!

Daily/Weekly Routines

Most of my work necessitates long stretches of thinking and hacking time. Without that, I’m unable to get into "the zone" to do anything productive. Hence, I have a habit of packing meetings onto Mondays (a.k.a. "Meeting Mondays"). Backup times for meetings, which I prefer to not do, are 11 am and 1 pm, bookending lunch time so that I don’t end up with a fragmented morning/afternoon. The only exceptions I make are for my two high-priority team meetings, for which I defer to the rest of the team. I’m glad that my managers understand the need for long stretches of hacking time, and have stuck to Monday one-on-one meetings.

Hence, almost every day from Tuesday through to Friday, I have long stretches of pre-allocated time for hacking. It’s data science scheduling bliss! It also means I turn down a lot of "can I meet you to chat" invites - unless we can pack them on Monday!

On Friday, I make a point to try to work remotely. It helps with sanity, particularly in the winter, when the commute gets harsh and I can’t bike. Fridays also are the days on which I try to do my open source work.

Pair Coding

Pair coding with others on mutual projects has been a very productive endeavor, which I have written about before. Unlike weekly update meetings, I plan for pair coding on an as-needed basis. We have a pre-defined goal for what we want to accomplish, including a conceivably achievable goal and a stretch goal; achieving the easier one keeps us motivated. It follows the "no agenda, no meeting" rule of thumb by which I protect my time.

I found that a good setup is really necessary for pair coding to be successful. A minimum is a dual-monitor setup, with one extra keyboard + mouse for my coding partner.

One thing I didn’t mention in my previous blog post was how knowledge transfer happens. Here’s how I think it works. We have one in the "driver’s seat", and the other in the observer role. Knowledge transfer generally happens from the more experienced person to the less experienced one, and the driver doesn’t necessarily have to be the more experienced one. For example, when pair coding with my intern, I play the role of observer and may dictate code or outline what needs to be done, but I don’t actively take over on my keyboard unless there’s a situation that shows up that is irrelevant to the coding session goals. On the other hand, if there’s a codebase I’ve developed for which I need to play the tour guide role, I will be in the driver’s seat, while the observer will help me catch peripheral errors that I’m making.

Learning New Things

Pair coding has been one way I learn new things. For example, with my colleague Zach as the observer, we hacked together a simple dashboard project using Flask, Holoviews and Panel.

I’m not very mathematically-savvy, in that algebra is difficult for me to follow. (I’m mildly algebra-blind, but getting better now.) Ironically, code, which is algebraic in nature too, but works with plain English names, works much better for me. Implementing algorithms and statistical methods using jax (for things that involve differential computing) and PyMC3 (for all things Bayesian) has served to be very educational. While implementing, I also impose some software abstractions on the math, and this also forces me to organize my knowledge, which also helps learning. Implementing things on the computer is also the perfect way to learn by teaching: The computer is the ultimately dumb student, as it will execute exactly as you tell it, mistakes included!


I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.

If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!

Finally, I do free 30-minute GenAI strategy calls for organizations who are seeking guidance on how to best leverage this technology. Consider booking a call on Calendly if you're interested!