written by Eric J. Ma on 2021-07-10 | tags: binder tutorial jupyter data science til
I figured out how to make my custom source code libraries installable in a Binder container - and do it in a way that preserves flexibility while still being easy to use.
When I make tutorial material, I often choose to write a custom library in support of the tutorial. For example, for this JAX-centric tutorial repository that I made (dl-workshop
), all exercise answers are written as Python functions that we can call in the notebook (and inspected at one's own pace). Doing so helps me ensure that I have one single source of truth for the answers and can write tests against them (if I desire so). By organizing the code answers into submodules, I also gain an organizational mapping from answers to tutorial sections - which is incredibly handy for navigating the library.
When building a Binder container using a configuration file, it's sometimes difficult to include the custom source library in the Binder build. I could, in theory, use the pip
section of an environment.yml
file to install the custom source as follows:
name: my_project channels: - conda-forge dependencies: - python=3.9 - ... - pip: - ... - -e src/.
However, there are cases where we might not necessarily want the current version of the custom library. In those cases, we may instead prefer a canonical version that we can reference (like the one that lives on the master
/main
branch). Sticking the -e src/.
in environment.yml
thus restricts the flexibility that we might otherwise need.
Suppose this level of flexibility is needed. In that case, we need to match the flexibility with the appropriate composition of tooling. Writing a Dockerfile
could be the way out. Thankfully the Binder team made a fit-for-purpose abstraction called the postBuild
script. Essentially postBuild
is nothing more than a shell script that gets executed right after the Docker container is built. We can use it to install our custom source library for use with Binder:
postBuild
in the root of our repository. It shouldn't have any file extensions, such as .sh
or .zsh
.src/
:#!/bin/bash set -e # don't allow errors to fail silently echo `which pip` # in case the wrong `pip` is used, this will let us debug pip install -e src/ # install the custom library
As a side note, Binder also provides start
. This is a way of running code before the user session starts (e.g. setting environment variables that we don't want to be stored in a container). I have yet to find a use for this myself, but I'm sure the good folks on the Binder team have excellent reasons for doing so.
I really do have to give it to the Binder team. They've done a fantastic job here in architecting the package to satisfy workflow needs. Kudos!
@article{
ericmjl-2021-how-binder,
author = {Eric J. Ma},
title = {How to enable custom source package installation in Binder},
year = {2021},
month = {07},
day = {10},
howpublished = {\url{https://ericmjl.github.io}},
journal = {Eric J. Ma's Blog},
url = {https://ericmjl.github.io/blog/2021/7/10/how-to-enable-custom-source-package-installation-in-binder},
}
I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.
If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!
Finally, I do free 30-minute GenAI strategy calls for teams that are looking to leverage GenAI for maximum impact. Consider booking a call on Calendly if you're interested!