written by Eric J. Ma on 2020-07-11 | tags: jupyter jupyter notebook notebook data science
If you're one of the types of programmers for whom a notebook interface helps with prototyping and scripting, it is possibly handy to treat notebooks as a script and execute them programmatically. There's multiple ways to do accomplish this.
nbconvert
to Python scriptOne way is to use nbconvert
to convert your script
into a Python script on the fly,
and then execute the Python script itself:
jupyter nbconvert --to python my_notebook.ipynb python my_notebook.py
This other way also uses nbconvert
,
but slightly differently in that
we take advantage of nbconvert
's execution capabilities:
jupyter nbconvert --to notebook --execute my_notebook.ipynb
By default, there is a timeout of 30 seconds, so if you need to specify a longer timeout than the default, you can do so:
jupyter nbconvert --ExecutePreprocessor.timeout=1200 --to notebook --execute mynotebook.ipynb
1200
is in seconds, so that will give you a 20 minute execution timeout.
With that said, you should ideally be ensuring that
anything programmatically executed should execute quickly.
papermill
Papermill takes the notebook execution paradigm one level up, in that it allows you parameterize your notebooks. I haven't used it myself, so I won't provide a code sample, but I'll link it here nonetheless.
There's really not much of a difference between the first two methods, so I'd encourage you to test-drive both and see which one you're more comfortable with.
Personally, I would choose the first option.
Even though it "feels" a bit more roundabout,
it helps me because I don't have to remember
the ExecutePreprocessor
syntax
(which can feel a bit clunky typing at the command line).
That said, if you wrap everything inside a nice Makefile
,
this minor practical difference goes away.
As for papermill
,
I see its utility for teams who don't necessarily yet have the software chops
to make well-structured Python packages and scripts.
It's a good hack en route to good practices,
but at the end of the day,
I'd choose "principled workflow over hacks" whenever practically possible.
In the long-run, it makes a ton of sense for data science teams
to equip themselves with software skills!
Yes, indeed!
For the Network Analysis Made Simple
tutorial series that my co-author Mridul and I teach,
I recently spent a bit of time figuring out
how to convert our collection of Jupyter notebooks and markdown files,
which get rendered in our official tutorial website
as a mkdocs
site
with a beautiful and functional mkdocs-material
theme,
into leanpub-flavoured markdown
that we then publish as a book for offline viewing.
In doing so, we could preserve a single source of truth for the content,
and automatically publish to two locations simultaneously.
I needed the interactivity of a Jupyter notebook to prototype what I wanted to get done. I also wanted to use the notebook as the "gold standard" source of truth, rather than a Python script, as this would facilitate modifying and debugging later (i.e. I could execute up to a certain point and go deep). Thus, in the Travis CI script, we first convert the notebook to a Python script, then execute it, and also build the website:
script: # Build LeanPub files - jupyter nbconvert --to python scripts/bookbuilder/markua.ipynb - python scripts/bookbuilder/markua.py # Build official website - mkdocs build
Finally, we get Travis to publish everything to the leanpub
branch,
which we never edit.
# Publish the LeanPub files - provider: pages skip_cleanup: true github_token: $GITHUB_TOKEN # Set in the settings page of your repository, as a secure variable keep_history: false on: branch: master target_branch: leanpub
@article{
ericmjl-2020-jupyter-scripts,
author = {Eric J. Ma},
title = {Jupyter notebooks as scripts},
year = {2020},
month = {07},
day = {11},
howpublished = {\url{https://ericmjl.github.io}},
journal = {Eric J. Ma's Blog},
url = {https://ericmjl.github.io/blog/2020/7/11/jupyter-notebooks-as-scripts},
}
I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.
If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!
Finally, I do free 30-minute GenAI strategy calls for teams that are looking to leverage GenAI for maximum impact. Consider booking a call on Calendly if you're interested!