A Review of the Python Data Science Dashboarding Landscape in 2019

written by Eric J. Ma on 2019-12-15

dashboarding python data science data visualization software development

This blog post is also available on my collection of essays.

Introduction

As Pythonista data scientists, we are spoiled for choice when it comes to developing front-ends for our data apps. We used to have to fiddle with HTML in Flask (or Plotly's Dash), but now, there are tools in which "someone wrote the HTML/JS so I didn't have to".

Let me give a quick tour of the landscape of tools as I've experienced it in 2019.

Beginnings: Voila

Previously, I had test-driven Voila. The key advantage I saw back then was that in my workflow, once I had the makings of a UI present in the Jupyter notebook, and just needed a way to serve it up independent of having my end-users run a Jupyter server, then Voila helped solve that use case. By taking advantage of existing the ipywidgets ecosystem and adding on a way to run and serve the HTML output of a notebook, Voila solved that part of the dashboarding story quite nicely. In many respects, I regard Voila as the first proper dashboarding tool for Pythonistas.

That said, development in a Jupyter notebook didn't necessarily foster best practices (such as refactoring and testing code). When my first project at work ended, and I didn't have a need for further dashboarding, I didn't touch Voila for a long time.

Another player: Panel

Later, Panel showed up. Panel's development model allowed a more modular app setup, including importing of plotting functions defined inside .py files that returned individual plots. Panel also allowed me to prototype in a notebook and see the output live before moving the dashboard code into a source .py file.

At work, we based a one-stop shop dashboard for a project on Panel, and in my personal life, I also built a minimal panel app that I also deployed to Heroku. Panel was definitely developed targeting notebook and source file use cases in mind, and this shows through in its source development model.

That said, panel apps could be slow to load, and without having a "spinner" solution in place (i.e. something to show the user that the app is "doing something" in the background), it sometimes made apps feel slow even though the slowness was not Panel's fault really. (My colleagues and I pulled out all the tricks in our bag to speed things up.)

Additionally, any errors that show up don't get surfaced to the app's UI, where developer eyeballs are on - instead, they get buried in the browser's JavaScript console or in the Python terminal where the app is being served. When deployed, this makes it difficult to see where errors show up and debug errors.

Enter Streamlit

Now, Streamlit comes along, and some of its initial demos are pretty rad. In order to test-drive it, I put together this little tutorial on the Beta probability distribution for my colleagues.

Streamlit definitely solves some of the pain points that I've observed with Panel and Voila.

The most important one that I see is that errors are captured by Streamlit and bubbled up to the UI, where our eyeballs are going to be when developing the app. For me, this is a very sensible decision to make, for two reasons:

Firstly, it makes debugging interactions that much easier. Instead of needing to have two interfaces open, the error message shows up right where the interaction fails, in the same browser window as the UI elements.

Secondly, it makes it possible for us to use the error messages as a UI "hack" to inform users where their inputs (e.g. free text) might be invalid, thereby giving them informative error messages. (Try it out in the Beta distribution app: it'll give you an error message right below if you try to type something that cant be converted into a float!)

The other key thing that Streamlit provides as a UI nice-ity is the ability to signal to end-users that a computation is happening. Streamlit does this in three ways, two of which always come for free. Firstly, if something is "running", then in the top-right hand corner of the page, the "Running" spinner will animate. Secondly, anything that is re-rendering will automatically be greyed out. Finally, we can use a special context manager to provide a custom message on the front-end:

import streamlit as st
with st.spinner("Message goes here..."):
    # stuff happens

So all-in-all, Streamlit seems to have a solution of some kind for the friction points that I have observed with Panel and Voila.

Besides that, Streamlit, I think, uses a procedural paradigm, rather than a callback paradigm, for app construction. We just have to think of the app as a linear sequence of actions that happen from top to bottom. State is never really an issue, because every code change and interaction re-runs the source file from top to bottom, from scratch. When building quick apps, this paradigm really simplifies things compared to a callback-based paradigm.

Finally, Streamlit also provides a convenient way to add text to the UI by automatically parsing as Markdown any raw strings unassigned to a variable in a .py file and rendering them as HTML. This opens the door to treating a .py file as a literate programming document, hosted by a Python-based server in the backend. It'd be useful especially in teaching scenarios. (With pyiodide bringing the PyData stack to the browser, I can't wait to see standalone .py files rendered to the DOM!)

Now, this isn't to say that Streamlit is problem-free. There are still rough edges, the most glaring (as of today) in the current release is the inability to upload a file and operate on it. This has been fixed in a recent pull request, so I'm expecting this should show up in a new release any time soon.

The other not-so-big-problem that I see with Streamlit at the moment is the procedural paradigm - by always re-running code from top-to-bottom afresh on every single change, apps that rely on long compute may need a bit more thought to construct, including the use of Streamlit's caching mechanism. Being procedural does make things easier for development though, and on balance, I would not discount Streamlit's simplicity here.

Where does Streamlit fit?

As I see it, Streamlit's devs are laser-focused on enabling devs to very quickly get to a somewhat good-looking app prototype. In my experience, the development time for the Beta distribution app took about 3 hours, 2.5 of which were spent on composing prose. So effectively, I only used half an hour doing code writing, with a live and auto-reloading preview greatly simplifying the development process. (I conservatively estimate that this is about 1.5 times as fast as I would be using Panel.)

Given Streamlit, I would use it to develop two classes of apps: (1) very tightly-focused utility apps that do one lightweight thing well, and (2) bespoke, single-document literate programming education material.

I would be quite hesitant to build more complex things; then again, for me, that statement would be true more generally anyways with whatever tool. In any case, I think bringing UNIX-like thinking to the web is probably a good idea: we make little utilities/functional tools that can pipe standard data formats from to another.

Common pain points across all three dashboarding tools

A design pattern I have desired is to be able to serve up a fleet of small, individual utilities served up from the same codebase, served up by individual server processes, but all packaged within the same container. The only way I can think of at the moment is to build a custom Flask-based gateway to redirect properly to each utility's process. That said, I think this is probably out of scope for the individual dashboarding projects.

How do we go forward?

The ecosystem is ever-evolving, and, rather than being left confused by the multitude of options available to us, I find myself actually being very encouraged at the development that has been happening. There's competing ideas with friendly competition between the developers, but they are also simultaneously listening to each other and their users and converging on similar things in the end.

That said, I think it would be premature to go "all-in" on a single solution at this moment. For the individual data scientist, I would advise to be able to build something using each of the dashboarding frameworks. My personal recommendations are to know how to use:

These recommendations stem mainly from the ability to style and layout content without needing much knowledge of HTML. In terms of roughly when to use what, my prior experience has been that Voila and Streamlit are pretty good for quicker prototypes, while Panel has been good for more complex ones, though in all cases, we have to worry about speed impacting user experience.

From my experience at work, being able to quickly hash out key visual elements in a front-end prototype gives us the ability to better communicate with UI/UX designers and developers on what we're trying to accomplish. Knowing how to build front-ends ourselves lowers the communication and engineering barrier when taking a project to production. It's a worthwhile skill to have; be sure to have it in your toolbox!

Did you enjoy this blog post? Let's discuss more!


Principled Git-based Workflow in Collaborative Data Science Projects

written by Eric J. Ma on 2019-11-09

data science git workflow

Having worked with GitFlow on a data science project and coming to a few epiphanies with it, I decided to share some of my thoughts in an essay.

One of my thoughts here is that most data scientists aren't resistant to using GitFlow (and more generally, just being more intentional about what gets worked on) because it's a bad idea, but because there's a lack of incentives to do so. In there, I try to address this concern.

And because GitFlow does require knowledge of Git, it can trigger an, "Oh no, one more thing to learn!" response. These things do take time to learn, yes, but I see it also as an investment of time with a future payoff.

Apart from that, I hope you enjoy the essay; writing it was also a great opportunity for me to pick up more advanced features of pymdownx, a package that extends Markdown syntax with other really cool features.

Did you enjoy this blog post? Let's discuss more!


Reimplementing and Testing Deep Learning Models

written by Eric J. Ma on 2019-10-31

data science deep learning testing pair programming code review

Note: this blog post is cross-posted on my personal essay collection on the practice of data science.

At work, most deep learners I have encountered have a tendency to take deep learning models and treat them as black boxes that we should be able to wrangle. While I see this as a pragmatic first step to testing and proving out the value of a newly-developed deep learning model, I think that stopping there and not investing the time into understanding the nitty-gritty of the model leaves us in a poor position to know that model's (1) applicability domain (i.e. where the model should be used), (2) computational and statistical performance limitations, and (3) possible engineering barriers to getting the model performant in a "production" setting.

As such, with deep learning models, I'm actually a fan of investing the time to re-implement the model in a tensor framework that we all know and love, NumPy (and by extension, JAX).

Benefits of re-implementing deep learning models

Doing a model re-implementation from a deep learning framework into NumPy code actually has some benefits for the time being invested.

Developing familiarity with deep learning frameworks

Firstly, doing so forces us to know the translation/mapping from deep learning tensor libraries into NumPy. One of the issues I have had with deep learning libraries (PyTorch and Tensorflow being the main culprits here) is that their API copies something like 90% of NumPy API without making easily accessible the design considerations discussed when deciding to deviate. (By contrast, CuPy has an explicit API policy that is well-documented and front-and-center on the docs, while JAX strives to replicate the NumPy API.)

My gripes with tensor library APIs aside, though, translating a model by hand from one API to another forces growth in familiarity with both APIs, much as translating between two languages forces growth in familiarity with both languages.

Developing a mechanistic understanding of the model

It is one thing to describe a deep neural network as being "like the brain cell connections". It is another thing to know that the math operations underneath the hood are nothing more than dot products (or tensor operations, more generally). Re-implementing a deep learning model requires combing over every line of code, which forces us to identify each math operation used. No longer can we hide behind an unhelpfully vague abstraction.

Developing an ability to test and sanity-check the model

If we follow the workflow (that I will describe below) for reimplementing the model, (or as the reader should now see, translating the model between APIs) we will develop confidence in the correctness of the model. This is because the workflow I am going to propose involves proper basic software engineering workflow: writing documentation for the model, testing it, and modularizing it into its logical components. Doing each of these requires a mechanistic understanding of how the model works, and hence forms a useful way of building intuition behind the model as well as correctness of the model.

Reimplementing models is not a waste of time

By contrast, it is a highly beneficial practice for gaining a deeper understanding into the inner workings of a deep neural network. The only price we pay is in person-hours, yet under the assumption that the model is of strong commercial interest, that price can only be considered an investment, and not a waste.

A proposed workflow for reimplementing deep learning models

I will now propose a workflow for re-implementing deep learning models.

Identify a coding partner

Pair programming is a productive way of teaching and learning. Hence, I would start by identifying a coding partner who has the requisite skillset and shared incentive to go deep on the model.

Doing so helps a few ways.

Firstly, we have real-time peer review on our code, making it easier for us to catch mistakes that show up.

Secondly, working together at the same time means that both myself and my colleague will learn something about the neural network that we are re-implementing.

Pick out the "forward" step of the model

The "forward" pass of the model is where the structure of the model is defined: basically the mathematical operations that transform the input data into the output observations.

A few keywords to look out for are the forward() and __call__() class methods.

class MyModel(nn.Model):
    # ...
    def forward(self, X):
        # Implementation of model happens here.
        something = ...
        return something

For models that involve an autoencoder, somewhat more seasoned programmers might create a class method called encoder() and decoder(), which themselves reference another model that would have a forward() or __call__() defined.

class AutoEncoder(nn.Model):
    # ...
    def forward(self, X):
        something = self.encoder(X)
        output = self.decoder(something)
        return output

Re-implementing the forward() part of the model is usually a good way of building a map of the equations that are being used to transform the input data into the output data.

Inspect the shapes of the weights

While the equations give the model structure, the weights and biases, or the parameters, are the part of the model that are optimized. (In Bayesian statistics, we would usually presume a model structure, i.e. the set of equations used alongside the priors, and fit the model parameters.)

Because much of deep learning hinges on linear algebra, and because most of the transformations that happen involve transforming the input space into the output space, getting the shapes of the parameters is very important.

In a re-implementation exercise with my intern, where we re-implemented a specially designed recurrent neural network layer in JAX, we did a manual sanity check through our implementation to identify what the shapes would need to be for the inputs and outputs.

Write tests for the neural network components

Once we have the neural network model and its components implemented, writing tests for those components is a wonderful way of making sure that (1) the implementation is correct, to the best of our knowledge, and that (2) we can catch when the implementation might have been broken inadvertently.

The shape test (as described above) is one way of doing this.

def test_layer_shapes():
    weights = np.random.normal(size=(input_dims, output_dims))
    data = np.random.normal(size=(batch_size, input_dims))
    output = nn_layer(weights, data)
    assert output.shape[1] == output_dims

If there are special elementwise transforms performed on the data, such as a ReLU or exponential transform, we can test that the numerical properties of the output are correct:

def test_layer_shapes():
    weights = np.random.normal(size=(input_dims, output_dims))
    data = np.random.normal(size=(batch_size, input_dims))

    output = nn_layer(weights, data, nonlinearity="relu")
    assert np.all(output >= 0)

Write tests for the entire training loop

Once the model has been re-implemented in its entirety, prepare a small set of training data, and pass it through the model, and attempt to train it for a few epochs.

If the model, as implemented, is doing what we think it should be, then after a dozen epochs or so, the training loss should go down. We can then test that the training loss at the end is less than the training loss at the beginning. If the loss does go down, it's necessary but not sufficient for knowing that the model is implemented correctly. However, if the loss does not go down, then we will definitely know that a problem exists somewhere in the code, and can begin to debug.

An example with pseudocode below might look like the following:

from data import dummy_graph_data
from model import gnn_model
from params import make_gnn_params
from losses import mse_loss
from jax import grad
from jax.experimental.optimizers import adam

def test_gnn_training():
    # Prepare training data
    x, y = dummy_graph_data(*args, **kwargs)
    params = make_gnn_params(*args, **kwargs)

    dloss = grad(mse_loss)
    init, update, get_params = adam(step_size=0.005)
    start_loss  = mse_loss(params, model, x, y)

    state = init(params)
    for i in range(10):
        g = dloss(params, model, x, y)

        state = update(i, g, state)
        params = get_params(state)

    end_loss = mse_loss(params, model, x, y)

    assert end_loss < start_loss

A side benefit of this is that if you commit to only judiciously changing the tests, you will end up with a stable and copy/paste-able training loop that you know you can trust on new learning tasks, and hence only need to worry about swapping out the data.

Build little tools for yourself that automate repetitive (boring) things

You may notice in the above integration test, we wrote a lot of other functions that make testing much easier, such as dummy data generators, and parameter initializers.

These are tools that make composing parts of the entire training process modular and easy to compose. I strongly recommend writing these things, and also backing them with more tests (since we will end up relying on them anyways).

Now run your deep learning experiments

Once we have the model re-implemented and tested, the groundwork is present for us to conduct extensive experiments with the confidence that we know how to catch bugs in the model in a fairly automated fashion.

Concluding words

Re-implementing deep learning models can be a very fun and rewarding exercise, because it serves as an excellent tool to check our understanding of the models that we work with.

Without the right safeguards in place, though, it can also very quickly metamorphose into a nightmare rabbithole of debugging. Placing basic safeguards in place when re-implementing models helps us avoid as many of these rabbitholes as possible.

Did you enjoy this blog post? Let's discuss more!


Code review in data science

written by Eric J. Ma on 2019-10-30

essays data science workflow good practices

This blog post is cross-posted in my essays collection.

The practice of code review is extremely beneficial to the practice of software engineering. I believe it has its place in data science as well.

What code review is

Code review is the process by which a contributor's newly committed code is reviewed by one or more teammate(s). During the review process, the teammate(s) are tasked with ensuring that they

on the codebase.

If you've done the practice of scientific research before, it is essentially identical to peer review, except with code being the thing being reviewed instead.

What code review isn't

Code review is not the time for a senior person to slam the contributions of a junior person, nor vice versa.

Why data scientists should do code review

The first reason is to ensure that project knowledge is shared amongst teammates. By doing this, we ensure that in case the original code creator needs to be offline for whatever reason, others on the team cover for that person and pick up the analysis. When N people review the code, N+1 people know what went on. (It does not necessarily have to be N == number of people on the team.)

In the context of notebooks, this is even more important. An analysis is complex, and involves multiple modelling decisions and assumptions. Raising these questions, and pointing out where those assumptions should be documented (particularly in the notebook) is a good way of ensuring that N+1 people know those implicit assumptions that go into the model.

The second reason is that even so-called "senior" data scientists are humans, and will make mistakes. With my interns and less-experienced colleagues, I will invite them to constructively raise queries about my code where it looks confusing to them. Sometimes, their lack of experience gives me an opportunity to explain and share design considerations during the code review process, but at other times, they are correct, and I have made a mistake in my code that should be rectified.

What code review can be

Code review can become a very productive time of learning for all parties. What it takes is the willingness to listen to the critique provided, and the willingness to raise issues on the codebase in a constructive fashion.

How code review happens

Code review happens usually in the context of a pull request to merge contributed code into the master branch. The major version control system hosting platforms (GitHub, BitBucket, GitLab) all provide an interface to show the "diff" (i.e. newly contributed or deleted code) and comment directly on the code, in context.

As such, code review can happen entirely asynchronously, across time zones, and without needing much in-person interaction.

Of course, being able to sync up either via a video call, or by meeting up in person, has numerous advantages by allowing non-verbal communication to take place. This helps with building trust between teammates, and hence doing even "virtual" in-person reviews can be a way of being inclusive towards remote colleagues.

Parting words

If your firm is set up to use a version control system, then you probably have the facilities to do code review available. I hope this essay encourages you to give it a try.

Did you enjoy this blog post? Let's discuss more!


“AI will not solve medicine”

written by Eric J. Ma on 2019-10-29

data science drug development artificial intelligence medicine

Those who think “AI will solve medicine” are delusional. I say this as a practitioner of machine learning in drug discovery and development.

First things first, “AI” is an overused term. We should stop using it, especially in medicinal research.

Now, my thoughts are more sanguine. The real value proposition of machine learning models in drug development is to navigate chemical, sequence, pathway, and knowledge space faster and smarter than we might otherwise do so without machine learning methods. It’s a modeling tool, and nothing more than that. It’s a tool for helping the human collective make better decisions than without it, but it’s also a double-edged sword. We can use the tool and then constrain our thinking because we have that tool, because we want to continue using that tool. Or we can use the tool to our advantage and liberate our mind to think of other things.

This thought was sparked off by an email that I was on at work. A molecule was approved for continued investigation (not even “go for safety trials”!), and 63 people were on that email. Imagine the number of people who are involved in getting a molecule past all research-phase checkpoints and all 3 clinical trial checkpoints. Hint: Many people are involved.

As I combed through the names on that email, the number of machine learners was vastly outnumbered by the number of colleagues who toiled daily at the bench, wrangling with even more uncertainty than that we have at our computers. We machine learners work in service of them, delivering insights and prioritized directions, just as they toil to generate the data that our data-hungry models need. It’s a symbiotic relationship.

What do all of those 63 people work on?

Some make the molecules. Others design the assays to test the molecules in. Yet others design the assays to find the target to then develop the assay for. It’s many layers of human creativity in the loop. I can’t automate the entirety of their work with my software tools, but I can augment them. I mean, yeah, I can find a new potential target, but ultimately it's a molecular biologist who develops the assay, especially if that assay has never existed before.

There are others who professionally manage the progress of the project. There’s sufficient complexity at the bench and in the silicon chips that we can’t each keep track of the big picture. Someone has to do that, and keep everybody focused.

And then there’s the handful of us who deal with numbers and mainly just numbers. Yes, it’s a handful. I counted them on my fingers. We do have an outsized impact compared to our numbers, but that’s because we can get computers to do our repetitive work for us. At the bench, robots are harder to work with. Having been at the bench before and failing badly at it, I can very much empathize with how tedious the work is. It’s expensive to collect that data, so the onus is on us computation types to get help navigate “data space” more smartly.

Did you enjoy this blog post? Let's discuss more!