ADVI: Scalable Bayesian Inference

written by Eric J. Ma on 2019-01-21

bayesian variational inference data science


You never know when scalability becomes an issue. Indeed, scalability necessitates a whole different world of tooling.

While at work, I've been playing with a model - a Bayesian hierarchical 4-parameter dose response model, to be specific. With this model, the overall goal (without going into proprietary specifics) was parameter learning - what's the 50th percentile concentration, what's the max, what's the minimum, etc.; what was also important was quantifying the uncertainty surrounding these parameters.

Prototype Phase

Originally, when I prototyped the model, I used just a few thousand samples, which was trivial to fit with NUTS. I also got the model specification (both the group-level and population-level priors) done using those same few thousand.

At some point, I was qualitatively and quantitatively comfortable with the model specification. Qualitatively - the model structure reflected prior biochemical knowledge. Quantitatively, I saw good convergence when examining the sampling traces, as well as the shrinkage phenomena.

Scaling Up

Once I reached that point, I decided to scale up to the entire dataset: 400K+ samples, 3000+ groups.

Fitting this model with NUTS with the full dataset would have taken a week, with no stopping time guarantees at the end of an overnight run - when I left the day before, I was still hoping for 5 days. However, switching over to ADVI (automatic differentiation variational inference) was a key enabler for this model: I was able to finish fitting the model with ADVI in just 2.5 hours, with similar uncertainty estimates (it'll never end up being identical, given random sampling).


I used to not appreciate that ADVI could be useful for simpler models; in the past, I used to think that ADVI was mainly useful in Bayesian neural network applications - in other words, with large parameter and large data models.

With this example, I'm definitely more informed about what "scale" can mean: both in terms of number of parameters in a model, and in terms of number of samples that the model is fitted on. In this particular example, the model is simple, but the number of samples is so large that ADVI becomes a feasible alternative to NUTS MCMC sampling.

Did you enjoy this blog post? Let's discuss more!

Conda hacks for data science efficiency

written by Eric J. Ma on 2018-12-25

data science conda hacks

The conda package manager has, over the years, become an integral part of my workflow. I use it to manage project environments, and have built a bunch of very simple hacks around it that you can adopt too. I'd like to share them with you, alongside the rationale for using them.

Hack #1: Set up your .condarc

Why? It will save you a few keystrokes each time you want to do something with conda. For example, in my .condarc, I have the following:

# Set the channels that the `conda install` command will 
# automatically search through.
  - defaults
  - conda-forge
  - ericmjl

# Always accept installation. Is convenient, but always 
# double-check!
always_yes: true

For more information on how to configure your .condarc, check the online documentation!

Hack #2: Use one environment spec file per project

This assumes that you have the habit of putting all files related to one project inside one folder, using subdirectories for finer-grained organization.

Why? It will ensure that you have one version-controlled, authoritative specification for the packages that are associated with the project. This is good for (1) reproducibility, as you can send it to a colleague and have them reproduce the environment, and (2) will enable Hack #3, which I will showcase afterwards.

# file name: environment.yml

# Give your project an informative name
name: project-name

# Specify the conda channels that you wish to grab packages from, in order of priority.
- defaults
- conda-forge
- ericmjl

# Specify the packages that you would like to install inside your environment. Version numbers are allowed, and conda will automatically use its dependency solver to ensure that all packages work with one another.
- python=3.7
- conda
- jupyterlab
- scipy
- numpy
- pandas
- pyjanitor
- pandas
# There are some packages which are not conda-installable. You can put the pip dependencies here instead.
- pip:
    - tqdm  # for example only, tqdm is actually available by conda.

A hack that I have related to this is that I use TextExpander shortcut to populate a starting environment spec file.

Additionally, if I want to install a new package, rather than simply typing conda install <packagename>, I will add the package to the environment spec file, and then type conda env update -f environment.yml, as more often than not, my default is to continue using the package I added.

For more details on what the environment spec file is all about, read the online docs!

Hack 3: use conda-auto-env

Written by Christine Doig, conda-auto-env is a bash hack that enables you to automatically activate an environment once you enter into a project directory, as long as an environment.yml file already exists in the directory. If the environment does not already exist, then conda-auto-env will automatically create one based on the environment.yml file in your project directory.

Why? If you have many projects that you are working on, then it will greatly reduce the amount of effort used to remember which project to activate.

conda-auto-env looks like this:

# File: .conda-auto-env

function conda_auto_env() {
  if [ -e "environment.yml" ]; then
    # echo "environment.yml file found"
    ENV=$(head -n 1 environment.yml | cut -f2 -d ' ')
    # Check if you are already in the environment
    if [[ $PATH != *$ENV* ]]; then
      # Check if the environment exists
      conda activate $ENV
      if [ $? -eq 0 ]; then
        # Create the environment and activate
        echo "Conda env '$ENV' doesn't exist."
        conda env create -q
        conda activate $ENV

export PROMPT_COMMAND=conda_auto_env

To use it, you have two options. You can either copy/paste the whole original script into your .bashrc, or you can put it in a file called .conda-auto-env, and source it from your .bashrc. I recommend the latter, as it makes managing your .bashrc easier:

# File: .bashrc
source /path/to/.conda-auto-env

Hack 4: hijack bash aliases for conda commands

I use aliases to save myself a few keystrokes whenever I'm at the terminal. This is a generalizable bash hack, but here it is as applied to conda commands.

Anyways, these are the commands that I use most often, which I have found it useful to alias:

# File: .aliases
alias ceu="conda env update"
alias cl="conda list"
alias ci="conda install"
alias cr="conda remove"

Make sure your aliases don't clash with existing commands that you use!

Then, source .aliases in your . bashrc:

# File: .bashrc
source /path/to/.aliases

Now, all of your defined aliases will be available in your bash shell.

The idea/pattern, as I mentioned earlier, is generalizable beyond just bash commands. (I have ls aliased for exa, and l aliased for ls - the epitome of laziness!)


I hope you found these bash and conda hacks to be useful. Hopefully they will help you become more productive and efficient!

Did you enjoy this blog post? Let's discuss more!

Gaussian Process Notes

written by Eric J. Ma on 2018-12-16

data science bayesian

I first learned GPs about two years back, and have been fascinated by the idea. I learned it through a video by David MacKay, and managed to grok it enough that I could put it to use in simple settings. That was reflected in my Flu Forecaster project, in which my GPs were trained only on individual latent spaces.

Recently, though, I decided to seriously sit down and try to grok the math behind GPs (and other machine learning models). To do so, I worked through Nando de Freitas' YouTube videos on GPs. (Super thankful that he has opted to put these videos up online!)

The product of this learning is two-fold. Firstly, I have added a GP notebook to my Bayesian analysis recipes repository.

Secondly, I have also put together some hand-written notes on GPs. (For those who are curious, I first hand-wrote them on paper, then copied them into my iPad mini using a Wacom stylus. We don't have the budget at the moment for an iPad Pro!) They can be downloaded here.

Some lessons learned:

Did you enjoy this blog post? Let's discuss more!

Mathematical Intuition

written by Eric J. Ma on 2018-12-09

deep learning bayesian math data science

Last week, I picked up Jeremy Kun's book, "A Programmer's Introduction to Mathematics". In it, I finally found an explanation for my frustrations when reading math papers:

What programmers would consider “sloppy” notation is one symptom of the problem, but there there are other expectations on the reader that, for better or worse, decelerate the pace of reading. Unfortunately I have no solution here. Part of the power and expressiveness of mathematics is the ability for its practitioners to overload, redefine, and omit in a suggestive manner. Mathematicians also have thousands of years of “legacy” math that require backward compatibility. Enforcing a single specification for all of mathematics—a suggestion I frequently hear from software engineers—would be horrendously counterproductive.

Reading just that paragraph explained, in such a lucid manner, how my frustrations reading mathematically-oriented papers, stemmed from mismatched expectations. I come into a paper thinking like a software engineer. Descriptive variable names (as encouraged by Python), which are standardized as well, with structured abstractions providing a hierarchy of logic between chunks of code... No, mathematicians are more like Shakespeare - or perhaps linguists - in that they will take a symbol and imbibe it with a subtly new meaning or interpretation inspired by a new field. That "L" you see in one field of math doesn't always exactly mean the same thing in another field.

Biology vs. Math?

The contrast is stark when compared against reading a biology paper. With a biology paper, if you know the key wet-bench experiment types (and there's not that many), you can essentially get the gist of a paper by reading the abstract and dissecting the figures, which, granted, are described and labelled with field-specific jargon, but are at least descriptive names. With a math-oriented paper, the equations are the star, and one has to really grok each element of the equations to know what they mean. It means taking the time to dissect each equation and ask what each symbol is, what each group of symbols means, and how those underlying ideas connect with one another and with other ideas. It's not unlike a biology paper, but requiring a different kind of patience, one that I wasn't trained in.

Learning to Learn by Teaching

As Jeremy Kun wrote in his book, programmers do have some sort of a leg-up when it comes to reading and understanding math. It's a bit more than what Kun wrote, I think - yes, many programming ideas have deep mathematical connections. But I think there's more.

One thing we know from research into how people learn is that teaching someone something is an incredible way to learn that something. From my prior experience, the less background a student has in a material, the more demands are placed on the teacher's understanding of the material, as we work out how the multiple representations in our head to try to communicate it to them.

As it turns out, we programmers have the ultimate dumb "student" available at our fingertips: Our computers! By implementing mathematical ideas in code, we are essentially "teaching" the computer to do something mathematical. Computers are not smart; they are programmed to do exactly what we input to them. If we get an idea wrong, our implementation of the math will likely be wrong. That fundamental law of computing shows up again: Garbage in, garbage out.

Hierarchy of Ideas

More than just that, when we programmers implement a mathematical idea in code, we can start putting our "good software engineering" ideas into place! It helps the math become stickier when we can see, through code, the hierarchy of concepts that are involved.

An example, for me, comes from the deep learning world. I had an attempt dissecting two math-y deep learning papers last week. Skimming through the papers didn't do much good for my understanding of the paper. Neither did trying to read the paper like I do a biology paper. Sure, I could perhaps just read the ideas that the authors were describing in prose, but I had no intuition on which to base a proper critique of the idea's usefulness. It took implementing those papers in Python code, writing tests for them, and using abstractions that I had previously written, to come to a place where I felt like the ideas in the paper were a flexibly wieldable tool in my toolkit.

Reinventing the wheel, such that we can learn the wheel, can in fact help us decompose the wheel so that we can do other new things with it. Human creativity is such a wonderful thing!

Did you enjoy this blog post? Let's discuss more!

Solving Problems Actionably

written by Eric J. Ma on 2018-11-13

data science insight data science

There's a quote by John Tukey that has been a recurrent theme at work.

It's better to solve the right problem approximately than to solve the wrong problem exactly.

Continuing on the theme of quoting two Georges:

All models are wrong, but some are more useful than others.

H/T Allen Downey for pointing out that our minds think alike.

I have been working on a modelling effort for colleagues at work. There were two curves involved, and the second depended on the first one.

In both cases, I started with a simple model, and made judgment calls along the way as to whether to continue improving the model, or to stop there because the current iteration of the model was sufficient enough to act on. With first curve, the first model was actionable for me. With the second curve, the first model I wrote clearly wasn't good enough to be actionable, so I spent lots more rounds of iteration on it.

But wait, how does one determine "actionability"?


For myself, it has generally meant that I'm confident enough in the results to take the next modelling step. My second curves depended on the first curves, and after double-checking multiple ways, I thought the first curve fits, though not perfect, were good enough when applied across a large number of samples that I could instead move on to the second curves.

For others, particularly at my workplace, it generally means a scientist can make a decision about what next experiment to run.

Insight's MVP Influence

Going through Insight Data Science drilled into us an instinct for developing an MVP for our problem before going on to perfect it. I think that general model works well. My project's final modelling results will be the result of chains of modelling assumptions at every step. Documenting those steps clearly, and then being willing to revisit those assumptions, is going always a good thing.

Did you enjoy this blog post? Let's discuss more!