Eric J Ma's Website

Refactor Notebook Code

written by Eric J. Ma on 2018-01-29 | tags: jupyter data science software engineering


In which I argue why we should refactor code out of Jupyter notebooks as soon as possible.

Jupyter notebooks that are filled with complex analyses can get unwieldy. Refactoring repeated code out into functions placed in modules should be standard practice, but from the sampling of Jupyter notebooks I've seen, I don't think this is standard practice.

When should code be refactored? As soon as we start copying/pasting it! Making sure I have self-contained functions ensures that lingering state in my notebook doesn't cause unexpected behaviour. (Side note: learning the "functional" programming mindset can be very useful here!)

But won't this slow down my pace? Isn't it faster to just copy and paste the code, and tweak what I need? Yes, but a small speed hit is going to be traded for a massive bump in rigour. Just today, I saw the effects of "lingering state" in my notebooks causing my plots to display different things before and after refactoring. It's not a good sign for any analysis if this happens.

In short, refactor your code.


Cite this blog post:
@article{
    ericmjl-2018-refactor-code,
    author = {Eric J. Ma},
    title = {Refactor Notebook Code},
    year = {2018},
    month = {01},
    day = {29},
    howpublished = {\url{https://ericmjl.github.io}},
    journal = {Eric J. Ma's Blog},
    url = {https://ericmjl.github.io/blog/2018/1/29/refactor-notebook-code},
}
  

I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.

If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!

Finally, I do free 30-minute GenAI strategy calls for teams that are looking to leverage GenAI for maximum impact. Consider booking a call on Calendly if you're interested!