You should know your computing stack
Know your stack!
Data scientists should strive to know every last detail about their compute stack.
I believe in this philosophy: to execute on your best work, you need to have end-to-end control over every last detail.
This is illustrated by the following examples.
Firstly, the story of Jeff Dean and Sanjay Ghemawat: in order to optimize Google's software early on in Google's life, they went all the way down to the hardware and manipulated bits. I believe we should aspire to do the same if we want maximal effectiveness.
Secondly, on another tech giant constantly in my radar: Apple makes some of the world's most beloved, though to some most hated, computers. One thing that cannot be denied: there is a general pattern of excellence in their execution.1 And one of the causal factors is their relentless control over every little last detail in how they build computers. I believe the same exists for our work as data scientists.
Thirdly, I remember digging deep into 64-bit vs. 32-bit computation on GPU vs. CPU in order to solve a multinomial sampler problem in NumPy, in order to complete the implementation of a Bayesian neural network that I was writing. If I didn't dig deep, I would have been stuck and be unable to complete the neural network implementation that I was working on, which would have been a pity: my PyData NYC 2017 talk on Bayesian deep learning would never have materialized.
The compute stack that we use, from as low-level as the processor architecture (ARM/x86/x64/RISC) to the web technologies that we touch, affect what we do on a day-to-day basis, and thus what we can deliver. Having the depth of knowledge to be able to effectively navigate every layer of the stack gives us superpowers to deliver the best work that we can.
See this philosophy in action
Here's what I've found works best when you truly know your computing stack:
Environment management that actually works
When you understand your stack from Python interpreter to package manager, you can make informed choices. I've lost count of how many times this has saved me from dependency hell. See how this plays out in Modern environment management with Pixi - instead of wrestling with conda environments, you'll understand why feature-based environments eliminate dependency conflicts.
Editor configuration that saves hours
Knowing your editor deeply means configuring it to eliminate friction. The beauty of this approach is that you stop fighting your tools. Check out the VSCode configuration section where I configure auto-save every 10 seconds ("files.autoSave": "afterDelay"
), set up format-on-save, and install the exact extensions that matter for data science work.
Shell mastery for speed
Understanding your shell environment unlocks serious productivity. You'll love the magic of well-crafted aliases! The Shell aliases section shows how knowing shell internals lets you create shortcuts like alias ls="exa --long"
that save keystrokes hundreds of times per day.
Environment variables done right
Stack knowledge means understanding how environment variables flow through your system. What are the advantages of this approach? No more "which environment am I in?" confusion. See Environment variable management where I use direnv
to automatically load project-specific configurations without manual setup.
The result: You spend time solving problems, not fighting your tools.
-
I say "general" because while Apple has made missteps over the years, the vast majority of their products demonstrate exceptional execution and attention to detail. ↩