Eric J Ma's Website

Using Bokeh in FluForecaster

written by Eric J. Ma on 2017-06-30 | tags: bokeh data science dashboarding


Motivation

As a Health Data Fellow at Insight, we spend about 3 weeks executing on a data project, which we demo at companies that are hiring. I built FluForecaster, which was a project aimed at forecasting influenza sequence evolution using deep learning.

My choice of project was strategically aligned with my goals on a few levels. Firstly, I wanted to make sure my project showcased deep learning, as it's currently one of the hottest skills to have. Secondly, I had components of the code base written in separate Jupyter notebooks prior to Insight, meaning, I could execute on it quickly within the three weeks we had. Thirdly, I had intended to join Insight primarily with the goal of networking with the Insight community, and that basically meant 'being a blessing' to others on their journey too - if I could execute fast and well on my own stuff, then there'd be time to be a team player with other Fellows in the session, and help them get their projects across the finish line.

Each of us had to demo a "final product". Initially, I was thinking about a "forecasting dashboard", but one of our program directors, Ivan, suggested that I include more background information. As such, I decided to make the dashboard an interactive blog post instead. Thus, with FluForecaster being a web-first project, I finally had a project in which I could use Bokeh as part of the front-end.

Applying Bokeh

Bokeh was used mainly for displaying three data panels in the browser. Firstly, I wanted to show how flu vaccine efficacy rarely crossed the 60% threshold over the years. Secondly, I wanted to show a breakdown of the number of sequences collected per year (as used in my dataset). Thirdly, I wanted to show a visual display of influenza evolution.

For yearly vaccine effectiveness, it was essentially a line and scatter chart, with the Y-axis constrained between 0 and 100%. I added a hover tooltip to enable my readers to see the exact value of vaccine effectiveness as measured by the US CDC.

To show the number of sequences per year in the dataset, the same kind of chart was deployed.

Bokeh magic became really evident later when I wanted to show sequence evolution in 3 dimensions. Because 3D charts are generally a poor choice for a flat screen, I opted to show pairs of dimensions at a time. A nice side-effect of this is that because my ColumnDataSource was shared amongst each of the three pairs of coordinates, panning and selection was automatically linked for free.

Usage Pros and Cons

Bokeh's API is very powerful, in that it supplies many plotting primitive objects (glyphs, particularly), and that makes it a big plus for users who are experienced with the library, who are creating complex interactive charts.

Most of my fellow Fellows at Insight ended up using the bokeh.plotting interface, and I did too. I think the bokeh.plotting interface provides the best balance between ease-of-use and flexibility. If you take a look at the code here, you'll notice that there's often a bit of boilerplate that gets repeated with variation, such as in the configuration of custom hover tools. I think this is the tradeoff we make for configurability... or I might just be not writing code most efficiently. :)

There were times where I was tempted to just use the bkcharts' declarative interface instead. It's a lot more easy to use. However, I did have some time on hand, and wanted to get familiar with the bokeh.plotting interface, because there's a possibility that I might want to make wrappers for other visualizations that can lend themselves to a declarative API.

Embedding Visualizations

I built my interactive blog post using a combination of Flask, hand-crafted HTML, Bootstrap CSS & JS, and Bokeh - which took care of the bulk of visuals. I drew static figures using Illustrator.

Embedding the necessary Bokeh components wasn't difficult. Very good documentation is available on the Bokeh docs. The key insight that I had learned was that I could have the components passed into my Flask app functions' return statements, and embed them using Jinja2 templating syntax. An example can be found here. Basically, components returns a div and a js object, which are essentially just strings. To embed them in the templates, we use the syntax {{ div|safe }} and {{ js|safe }}. That |safe is very important: it tells the Jinja2 templating engine that it's safe to render those pieces of Javascript and HTML.

Conclusion

Through the project, I became a lot more familiar with the Bokeh plotting library. Now I feel a bit torn! I've contributed to both the Bokeh and matplotlib projects, and I love them both! I've also come to deeply respect the lead developers of both projects, having interacted with them many times. If I were to make a comment on "where to use what" based on my experience, it'd probably still be the conservative view of "matplotlib for papers, bokeh for the web"... but I'm sure that will be outdated soon. Who knows how the Python plotting landscape will evolve - it's exciting times ahead, and at least for now, I'm happy for the experience driving a dataviz project with Bokeh!


I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.

If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!

Finally, I do free 30-minute GenAI strategy calls for organizations who are seeking guidance on how to best leverage this technology. Consider booking a call on Calendly if you're interested!