2021 04-April_social-media
Hello fellow datanistas!
First off, if you are wondering what happened to the March edition, I was a bit overloaded at work in the lead-up to parental leave, and so I intentionally took some time off from all things data related. To make things up, though, this month, there'll be two editions of the newsletter forthcoming, this being a special edition on Awesome Social Media Posts! (It was what I had planned for March, and I'm still excited to share it with you all.)
Isabelle Ghement has a great tweet that lists out the factors that influence the choice of a statistical model. Things that I learned there are that practical matters, such as skill level of an individual, are real constraints on whether a model can be used or not. Models are tools, and require skill to wield!
Samuel Colvin, who makes the awesome tool Pydantic, sponsored Ned Batchelder, who makes coverage.py
, a tool for testing code. The financial support, even if just the price of a cup of coffee a month (or latte, if you're feeling fancy), can make maintaining these tools financial viable for some of your favourite tools' maintainers!
The Protein Data Bank was instrumental in efforts to build vaccines and treatments against COVID-19, and the fact that now over 1000 such structures have been deposited highlights for me how focused data curation over a long period of time targeting one data modality can be such a worthwhile investment that pays off dividends multiple fold.
There's no doubt right now that machine learning, as a discipline, has intersected with many other disciplines. How do non-machine learners perceive the view of ML? David Ha (@hardmaru) tweeted a Reddit Thread that spells out some views.
Those who have worked with me know that I like to work in pairs, solving problems together. My sense is that it makes for more robust projects; creativity is also sharpened by having pairs work together. Does this hold all the time? Jacqueline Smith shares her take on her blog.
In this post shared by Andrew Fairless on LinkedIn, Matt Ranger talks about why the research on graph neural networks appeaer to be "more of the same" from the academy. It's a simultaneously entertaining and sobering read :).
On Twitter, Eric Topol shared a link to a publication in Nature Machine Intelligence in which the authors found that none of the published models for using chest radiographs and CT scans to predict COVID-19 progression were ready for the clinic. Why? I won't spill the beans here, check out the paper linked in the tweet!
Also known as "how observational biases give rise to spurious correlations". Tweeted out by Lionel Page, there's a whole thread! Mathematician Hannah Fry explains further with more examples of Berkson's paradox in her Numberphile video (linked in the tweet).
That ends this special social media edition of the Data Science Programming Newsletter. At the end of the month, we'll resume regular, ahem, programming.
Other cool stuff
Data Science Programming Newsletter MOC
With the Data Science Programming newsletter, I'm trying to share ideas on how to make