Software skills for data scientists

Organize your code with a logical structure that reflects your problem space

In doing so, you'll know how to navigate your codebase much more efficiently.

Taking time to organize your code with a logical structure also helps a lot with being minimalistic and intentional with code. The less you can write to solve the same amount of problems, the better.


This is the landing page for my notes.

This is 100% inspired by Andy Matuschak's famous notes page. I'm not technically skilled enough to replicate the full "Andy Mode", though, so I just did some simple hacks. If you're curious how these notes compiled, check out the summary in How these notes are made into HTML pages.

This is my "notes garden". I tend to it on a daily basis, and it contains some of my less fully-formed thoughts. Nothing here is intended to be cited, as the link structure evolves over time. The notes are best viewed on a desktop/laptop computer, because of the use of hovers for previews.

There's no formal "navigation", or "search" for these pages. To go somewhere, click on any of the "high-level" notes below, and enjoy.

  1. Notes on statistics
  2. Notes on differential computing
  3. The State of Data Science
  4. Network science
  5. Scholarly readings
  6. Software skills for data scientists
  7. The Data Science Programming Newsletter MOC
  8. Life and computer hacks
  9. Reading Bazaar
  10. Blog drafts
  11. Conference Proposals

Criteria for good enough tests

As much as possible

  • Use pytest to make your testing life easier.
  • Generalize your tests using hypothesis, but if in a pinch, use hard-coded examples and annotate where to generalize later.
  • Tests should run under 500 ms per test, but this is not a hard rule.
  • Hit as many lines of code as you can, but don't rely on test coverage to be satisfied with a test.
  • Use your tests to re-design code to be easier to use, and hence, test.