Place custom source code inside a lightweight package

## Why write a package for your custom source code

Have you encountered the situation where you create a new notebook, and then promptly copy code verbatim from another notebook with zero modifications?

As you as you did that, you created two sources of truth for that one function.

Now... if you intended to modify the function and test the effect of the modification on the rest of the code, then you still could have done better.

A custom source package that is installed into the conda environment that you have set up will help you refactor code out of the notebook, and hence help you define one source of truth for the entire function, which you can then import anywhere.

## How to create a custom source package for a project

Firstly, I'm assuming you are following the ideas laid out in Set up your project with a sane directory structure. Specifically, you have a src/ directory under the project root. Here, I'm going to give you a summary of the official Python packaging tutorial.

In your project src/ directory, ensure you have a few files:

|- src/
|- setup.py
|- source_package/  # rename this to the same name as the conda environment
|- data/         # for all data-related functions
|- schemas.py # this is for pandera schemas
|- __init__.py   # this is necessary
|- paths.py      # this is for path definitionsme
|- utils.py      # utiity functions that you might need
|- ...
|- tests/
|- test_utils.py # tests for utility functions
|- ...


If you're wondering about why we name the source package the same name as our conda environment, it's for consistency purposes. (see: Sanely name things consistently)

If you're wondering about the purpose of paths.py, read this page: Use pyprojroot to define relative paths to the project root

setup.py should look like this:

import setuptools

with open("README.md", "r", encoding="utf-8") as fh:

setuptools.setup(
name="source_package", # Replace with your environment name
version="0.1",         # Replace with anything that you need
packages=setuptools.find_packages(),
)


Now, you activate the environment dedicated to your project (see: Create one conda environment per project) and install the custom source package:

conda activate project_environment
cd src
pip install -e .


This will install the source package in development mode. As you continue to add more code into the custom source package, they will be instantly available to you project-wide.

Now, in your projects, you can import anything from the custom source package.

Note: If you've read the official Python documentation on packages, you might see that src/ has nothing special in its name. (Indeed, one of my reviewers, Arkadij Kummer, pointed this out to me.) Having tried to organize a few ways, I think having src/ is better for DS projects than having the setup.py file and source_package/ directory in the top-level project directory. Those two are better isolated from the rest of the project and we can keep the setup.py in src/ too, thus eliminating clutter from the top-level directory.

## How often should the package be updated?

As often as you need it!

Also, I would encourage you to avoid releasing the package standalone until you know that it ought to be used as a standalone Python package. Otherwise, you might prematurely bring upon yourself a maintenance burden!