written by Eric J. Ma on 2021-07-10 | tags: binder tutorial jupyter data science til
When I make tutorial material, I often choose to write a custom library in support of the tutorial. For example, for this JAX-centric tutorial repository that I made (
dl-workshop), all exercise answers are written as Python functions that we can call in the notebook (and inspected at one's own pace). Doing so helps me ensure that I have one single source of truth for the answers and can write tests against them (if I desire so). By organizing the code answers into submodules, I also gain an organizational mapping from answers to tutorial sections - which is incredibly handy for navigating the library.
When building a Binder container using a configuration file, it's sometimes difficult to include the custom source library in the Binder build. I could, in theory, use the
pip section of an
environment.yml file to install the custom source as follows:
name: my_project channels: - conda-forge dependencies: - python=3.9 - ... - pip: - ... - -e src/.
However, there are cases where we might not necessarily want the current version of the custom library. In those cases, we may instead prefer a canonical version that we can reference (like the one that lives on the
main branch). Sticking the
-e src/. in
environment.yml thus restricts the flexibility that we might otherwise need.
Suppose this level of flexibility is needed. In that case, we need to match the flexibility with the appropriate composition of tooling. Writing a
Dockerfile could be the way out. Thankfully the Binder team made a fit-for-purpose abstraction called the
postBuild script. Essentially
postBuild is nothing more than a shell script that gets executed right after the Docker container is built. We can use it to install our custom source library for use with Binder:
postBuildin the root of our repository. It shouldn't have any file extensions, such as
#!/bin/bash set -e # don't allow errors to fail silently echo `which pip` # in case the wrong `pip` is used, this will let us debug pip install -e src/ # install the custom library
As a side note, Binder also provides
start. This is a way of running code before the user session starts (e.g. setting environment variables that we don't want to be stored in a container). I have yet to find a use for this myself, but I'm sure the good folks on the Binder team have excellent reasons for doing so.
I really do have to give it to the Binder team. They've done a fantastic job here in architecting the package to satisfy workflow needs. Kudos!
I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.
If you would like to receive deeper, in-depth content as an early subscriber, come support me on Patreon!