Eric's Notes

2021 04-April_social-media

Hello fellow datanistas!

First off, if you are wondering what happened to the March edition, I was a bit overloaded at work in the lead-up to parental leave, and so I intentionally took some time off from all things data related. To make things up, though, this month, there'll be two editions of the newsletter forthcoming, this being a special edition on Awesome Social Media Posts! (It was what I had planned for March, and I'm still excited to share it with you all.)

When to use what model?

Isabelle Ghement has a great tweet that lists out the factors that influence the choice of a statistical model. Things that I learned there are that practical matters, such as skill level of an individual, are real constraints on whether a model can be used or not. Models are tools, and require skill to wield!

Sponsor the people who make your tools

Samuel Colvin, who makes the awesome tool Pydantic, sponsored Ned Batchelder, who makes coverage.py, a tool for testing code. The financial support, even if just the price of a cup of coffee a month (or latte, if you're feeling fancy), can make maintaining these tools financial viable for some of your favourite tools' maintainers!

Data curation is a worthwhile infrastructural investment

The Protein Data Bank was instrumental in efforts to build vaccines and treatments against COVID-19, and the fact that now over 1000 such structures have been deposited highlights for me how focused data curation over a long period of time targeting one data modality can be such a worthwhile investment that pays off dividends multiple fold.

How good are machine learning paper publication practices?

There's no doubt right now that machine learning, as a discipline, has intersected with many other disciplines. How do non-machine learners perceive the view of ML? David Ha (@hardmaru) tweeted a Reddit Thread that spells out some views.

Are two brains better than one in pair programming?

Those who have worked with me know that I like to work in pairs, solving problems together. My sense is that it makes for more robust projects; creativity is also sharpened by having pairs work together. Does this hold all the time? Jacqueline Smith shares her take on her blog.

Why I'm lukewarm on Graph Neural Networks

In this post shared by Andrew Fairless on LinkedIn, Matt Ranger talks about why the research on graph neural networks appeaer to be "more of the same" from the academy. It's a simultaneously entertaining and sobering read :).

No COVID-19 models are clinic ready!

On Twitter, Eric Topol shared a link to a publication in Nature Machine Intelligence in which the authors found that none of the published models for using chest radiographs and CT scans to predict COVID-19 progression were ready for the clinic. Why? I won't spill the beans here, check out the paper linked in the tweet!

Berkson's Paradox

Also known as "how observational biases give rise to spurious correlations". Tweeted out by Lionel Page, there's a whole thread! Mathematician Hannah Fry explains further with more examples of Berkson's paradox in her Numberphile video (linked in the tweet).

That ends this special social media edition of the Data Science Programming Newsletter. At the end of the month, we'll resume regular, ahem, programming.

Other cool stuff

Pages that link here

Data Science Programming Newsletter MOC
With the Data Science Programming newsletter, I'm trying to share ideas on how to make Key information Login: https://app

When to use what model?

Sponsor the people who make your tools

Data curation is a worthwhile infrastructural investment

How good are machine learning paper publication practices?

Are two brains better than one in pair programming?

Why I'm lukewarm on Graph Neural Networks

No COVID-19 models are clinic ready!

Berkson's Paradox

Pages that link here

Key information

Protocol

Newsletters

2020

2021