Eric J Ma's Website

What's the most optimal way to learn Bayesian statistics?

written by Eric J. Ma on 2020-06-15 | tags: bayesian statistics bayesian data science statistics inference

I've been reflecting on the way I learned statistics, and I think I learned it in a flawed fashion.

Traditionally, statistics is taught in the format of performing hypothesis tests to infer whether there's a difference between groups, or to learn the parameters of some curve.

Learning statistics in this direction leads to a ton of confusion, because we're taught the shortcut to the answer, rather than the first-principles way of thinking about a problem. We end up with the "standard t-test" and multiple confusing names for regression modelling, masquerading as canned procedures that can be used on any problem. (OK, that's a bit of a stretch, but please do tell me you were at least tempted to use the t-test in a situation where you just had to crank out an analysis...)

After seeing the following tweet from Michael Betancourt...

...I realized that the only reason why Markov Models and their variants clicked for me was thinking through the data generating process. The only reason why hierarchical models clicked for me was stepping through the data generating process on a real problem and linking them to statistical parameters. Without thinking through the data generating process, none of those models made any sense.

In some sense, thinking through the data generating process is an extremely natural thing to do. It's like telling a story about how our data came into being, and we know that telling stories is exactly what humans are great at. Storytelling helps us reason about the world. There should be no reason why we don't use statistical storytelling to reason about our problems.

Worrying first about the data generating process and then about the inferential procedure makes statistical inference less of a black box and more of a natural conclusion of statistical storytelling. We become less concerned with whether something is "significant", and instead more concerned with whether we "got the model right".

To put this into concrete action, I've been working on an alternative introduction to probabilistic programming and Bayesian inference that is lighter on math than most introductions, involves a lot of verbal storytelling, and goes heavier than most introductions in its use of programming. Here we practice the skill of hypothesizing a data generating story and translating that into the language of probability distributions, which can then be translated into SciPy stats Python code. Stay tuned!

I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.

If you would like to receive deeper, in-depth content as an early subscriber, come support me on Patreon!