Use relative paths to project roots

Inside Jupyter notebooks, I commonly see that we read in data from the filesystem using paths that look like this:

df = pd.read_csv("../../data/something.csv")

This is troublesome, because if the notebook moves, then the relative paths may move as well.

We can get around this by using pyprojroot.

from pyprojroot import here

df = pd.read_csv(here() / "data/something.csv")

Now, only if the data moves will we need to update the path in all of our notebooks.

If multiple notebooks use the same file, it's possibly prudent to refactor even the file path itself as a variable that gets imported. That way, you have one single source of truth for the path to the file of interest:

# this is a custom source file, like "custom.py"
from pyprojroot import here

something_path = here() / "data/something.csv"

And then in our notebooks:

from custom import something_path

df = pd.read_csv(something_path)

Now, if the file path changes, we update one location and the code should work across all notebooks; if the notebook file path changes, we need not do anything to guarantee that the data path is correct.

How to customize your matplotlib plots project-wide

According to the matplotlib docs, we can use a stylesheet file (named anything we want, can be matplotlibrc or something.mplstyle) to define our plot styles. Then, we can reference it anywhere we want, such as:

import matplotlib.pyplot as plt
plt.style.use('/path/to/stylesheet')

As a matter of good practice, relative paths to the project root are better than relative paths to a file, as they define a stable path to some location. (see: Use relative paths to project roots) We can do this using:

import matplotlib.pyplot as plt
from pyprojroot import here
plt.style.use(here() / 'relative/path/to/stylesheet')

Data scientists should learn how to write good code

Data scientists are most commonly writing and developing custom code. It's the most flexible way to write all the abstractions that are needed. By writing custom code, we need some tools to help with code quality.