Eric's Notes

2020 10-October

Content to feature:

Effective testing for machine learning systems

Recently at work, I've been building some bespoke machine learning models (autoregressive hidden Markov models and graph neural networks) for scientific problems that we encounter. In building those bespoke models, because we aren't using standard reference libraries, we have to build the model code from scratch. Since it's software, it needs tests, and Jeremy Jordan has a great blog post on how to effectively test ML systems. Definitely worth a read in my opinion.

Software engineering fundamentals for Data Scientists

In his Medium article, Gonzalo Ferreiro Volpi shares some fundamentals software skills for data scientists. For those of you who want to invest in levelling up your code-writing skills to reap multiplicative dividends in time saved, frustrations avoided and happiness, come check it out.

Reflecting on a year of making machine learning actually useful

In her blog post, Shreya Shankar has some extremely valuable insights into the practice of making ML useful in the real world, which I absolutely agree with. One, in particular, being the quote:

Outside of ML classes and research, I learned that often the most reliable way to get performance improvements is to find another piece of data which gives insight into a completely new aspect of the problem, rather than to add a tweak to the loss. Whenever model performance is bad, we (scientists and practitioners) shouldn’t only resort to investigating model architecture and parameters. We should also be thinking about “culprits” of bad performance in the data.

With that little teaser, I hope this gives you enough impetus to read it. :)

The Multiplicative Power of Masks

This article is one that is topical and relevant. I also appreciated the illustrations put in there. Also, it's a blog post that highlights a really powerful model -- where powerful doesn't mean millions of parameters, but rather conceptually simple, easy to communicate, broadly applicable, and intensely relevant for the times. Aatish Bhatia has done a tremendously wonderful job here with this explanation. It's a technical masterpiece.

From my collection:
- Some colleagues had questions about environment variables, so I decided to surface up an old post on the topic and spruce it up with more information on my essays collection.
- I moved data across work sites securely and as fast a commercial tools using nothing but free and open source tooling. Come read how.
- I also recently figured out how to directly open a Jupyter notebook in a Binder session. The hack is super cool.

Finally, some more humour from the ever on fire Kareem Carr :).

I have a deep learning joke but it has a lot of layers to it. https://t.co/puRs6lqCUY
— 🔥Kareem Carr🔥 (@kareem_carr) July 24, 2020

Pages that link here

Data Science Programming Newsletter MOC
With the Data Science Programming newsletter, I'm trying to share ideas on how to make Key information Login: https://app

Pages that link here

Key information

Protocol

Newsletters

2020

2021