Eric J Ma's Website

PyData NYC 2017

written by Eric J. Ma on 2017-10-10 | tags: pydata conferences python


I'm seriously looking forward to PyData NYC this year -- there's a great lineup of talks that I'm particularly looking forward to hearing! The theme for my set of must-see talks this year is "Bayesian machine learning" - there's much for me to learn!

The first is by my fellow Boston Bayesian Colin Caroll with his talk titled Two views on regression with PyMC3 and scikit-learn. Colin is a mathematician at heart, even though he does software engineering for living now, and I can't wait to hear about regularization strategies!

The second is by Nicole Carlson, with her talk titled Turning PyMC3 into scikit-learn. Nicole's talk is of interest to me because I've implemented models in PyMC3 before, and now would like to know how to make them reusable!

The third talk is by Chaya Stern, with her talk titled Bayesian inference in computational chemistry. Super relevant to my work at Novartis!

The fourth is by my fellow Boston Pythonista Joe Jevnik, who will be speaking on the first day about his journey into deep learning on some really cool time-series data. He works at Quantopian, BUT the spoiler here is that his talk is NOT about financial data! (I've heard his talk outline already.)

The fifth is a tutorial by Jacob Schrieber, with his talk titled pomegranate: fast and flexible probabilistic modeling in python. pomegranate's API models after the scikit-learn's API; with the API being the user-facing interface, and scikit-learn being the de facto go-to library for machine learning, I'd be interested to see how much more pomegranate adds to the ecosystem, particularly w.r.t. Bayesian models.

There are a swathe of other good talks that I'm expecting to be able to catch online later on. Matt Rocklin, who is the lead developer of Dask, has done a ton of work on speeding Python up through parallelism. His talk will be on the use of Cython & Dask to speed up GeoPandas.

Also, Thomas Caswell, one of the matplotlib lead devs who helped guide my first foray into open source contributions, is giving a tutorial on developing interactive figures in matplotlib. Highly recommended if you're into the visualization world!

Finally, the always-interesting, always entertaining en zyme will be speaking on an interesting topic.

Looking forward to being at the conference, and meeting old and new friends there!


I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.

If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!

Finally, I do free 30-minute GenAI strategy calls for organizations who are seeking guidance on how to best leverage this technology. Consider booking a call on Calendly if you're interested!