Use notebooks effectively
Notebooks are powerful tools for data science work, but they need to be used thoughtfully. Here's how to use them effectively in your workflow.
Choose Marimo over Jupyter for reactivity
Marimo notebooks should be your first choice for data science work. Unlike traditional Jupyter notebooks, Marimo provides true reactivity - when you change a cell, all dependent cells automatically re-run. This eliminates the manual "run all cells" workflow and makes your analysis much more interactive and efficient.
Marimo notebooks are saved as simple Python files, making them much easier to version control than Jupyter notebooks. This solves many of the git-related headaches that plague Jupyter users.
Notebooks as prototyping tools, not production code
Notebooks should primarily serve as your prototyping and exploration environment. While it's tempting to put notebooks directly into production, resist this urge unless you have:
- A properly tested source code repository accompanying the notebook
- A specific need for the notebook format (e.g., as a template for generating reports)
In most cases, if you're considering putting a notebook in production, you're better off converting it into proper Python scripts.
Data access best practices
One of the biggest pitfalls in notebook development is relying on local file systems and hardcoded paths. This creates immediate portability issues - your colleague won't be able to run your notebook because they don't have the same file structure as you.
Instead:
- Always pull data directly from a data catalog or centralized source of truth
- Use configuration files or environment variables for any path-dependent operations
- Document any data dependencies clearly at the start of the notebook
Scratch pad vs. Report-style notebooks
There are two main ways to use notebooks:
As scratch pad notebooks
- Use these for initial exploration
- Generate plots and analyze data freely
- Test different approaches and hypotheses
Keep these messy if you don't want to clean them up, but nonetheless, dictate copious amounts of notes into those notebooks - they're supposed to be an aid to your thinking process!
Report-style notebooks
Once you've done your exploration in a scratch pad notebook, you can create a polished report-style notebook. Here's an effective workflow:
- Use your scratch pad notebook to generate all necessary plots and analysis
- Dictate your findings and insights, explaining how the plots support your conclusions
- Use AI tools to help polish your narrative:
- Feed the AI your plots, code, and dictated thoughts
- Ask it to draft polished markdown cells
- Review and refine the AI-generated content
This two-stage approach keeps your exploration phase separate from your presentation phase, resulting in cleaner, more professional final notebooks.
Refactor with the help of AI
When the time is ripe, you will find yourself copying and pasting code. This is a good sign that you should be refactoring! But if you really want to supercharge your productivity, pass the notebook into an AI tool and ask it to propose a refactoring of any duplicated code. This is one of those situations where AI can help.
Publish notebooks and strip outputs before committing
Version control and notebook outputs are a notorious pain point. Never commit notebooks' outputs to the repository! They will cause repository bloat. Output cells also make it very hard to diff in git.
The Marimo advantage: If you're using Marimo notebooks, you don't need to worry about stripping outputs since they're saved as Python files without embedded outputs. This is one of the key advantages of Marimo over Jupyter.
For Jupyter users: If you're still using Jupyter notebooks, consider migrating to Marimo. Marimo notebooks are saved as simple Python files, which solves many of the version control problems that plague Jupyter users. The reactivity feature alone makes it worth the switch.
If you must stick with Jupyter, I recommend using a system that can convert notebooks into HTML or markdown that you publish before you commit, such as stringing together nbconvert
with md2cf
as a pre-commit hook before running the nbstripout
pre-commit hook. Doing so can help you auto publish notebooks that are to be committed just before they are actually committed, and they will be committed with only the source code and not the outputs.
Jupyter hygiene (if you must use Jupyter)
If you're still using Jupyter notebooks (though Marimo is strongly recommended), here are some additional practices that will save you headaches.
Kernel hygiene is crucial. Make it a habit to:
- Regularly restart your kernel and run all cells
- Keep track of your environment with a requirements.txt or environment.yml
- Use virtual environments religiously - one per project
For collaborative work:
- Add clear execution order numbering to your cells
- Include setup instructions in a prominent markdown cell at the top
- Document any non-obvious dependencies or configurations
- Consider using tools like Papermill for parameterizing notebooks
For long-term maintenance:
- Date your exploratory notebooks, and only use ISO8601 format (YYYY-MM-DD) for any dates.
- Keep a clear separation between throwaway experiments and keeper analysis
- Document your dead ends - they're valuable knowledge. Use dication + AI transcription to speed up this process.
- Consider archiving old notebooks in a separate directory structure, such as the
notebooks/archive/
directory in your standard project structure.