Use pixi for maximally ergonomic and reproducible environments

In the first edition of this book, I recommended one conda environment per project. Since then, the tooling has evolved and now I recommend one pixi configuration per project, where one pixi configuration can support multiple environments within one project. More on that distinction within this chapter.

Having used it for a while now, I recommend managing your projects using pixi. pixi is an environment management multi-tool that enables you to manage your project's software environment reproducibly and ergonomically.

For the hybrid data scientist and tool developer persona, I wrote a blog post that detailed what I learned about switching from the use of conda to pixi. I won't repeat it in its entirety here, but will instead reproduce (🥁) the essentials, amounting to a cheat sheet of sorts.

The Cheat Sheet of pixi commands

Install/update pixi

curl -fsSL https://pixi.sh/install.sh | bash # install pixi
echo 'eval "$(pixi completion --shell zsh)"' >> ~/.zshrc # enable auto-completion
echo 'eval "$(pixi completion --shell bash)"' >> ~/.bashrc # enable auto-completion
pixi self-update # self-update

The commands should be pretty self-explanatory!

Initialize a project

Using pyds-cli:

pyds project init # scaffolds project with cookiecutter and pixi

When starting a new project, I use pyds-cli to scaffold out the environment. Underneath the hood, the pyds project init command uses cookiecutter to scaffold out a new project and then calls on pixi to install the environment.

Using pixi directly:

pixi init --format pyproject # quick initialization for prototyping

Alternatively, if one only needs to quickly initialize a scratch environment for prototyping, one can just do pixi init and start from there.

Add in a new dependency

pixi add "package name"
pixi add --pypi "pypi package name"
pixi add -f "feature name"

When one wants to add in a new dependency, we use pixi add to add it to the environment specification. If you have a package that needs to be pulled from pypi, then add the --pypi flag. If the package needs to be added to just one feature (such as a cuda-specific feature), then use the -f flag and specify the feature name, e.g. cuda.

Conda-like shell activation

pixi shell
pixi shell -e "environment-name"

Sometimes, one needs a shell that has access to the Python/IPython interpreter that is associated with your environment. You can make this happen by running pixi shell, with an optional environment to specify (-e env-name).

Run tasks and programs

pixi run task-name # as specified in your `pyproject.toml`
pixi run -e docs quarto preview # run `quarto preview` within the docs environments
pixi run python # run Python within your project's default environment.

Because pixi enables one to replace Makefiles with tasks defined in pyproject.toml, it is possible to run tasks through aliases, but also run commands (such as quarto preview) within a user-defined pixi environment for the project.

Long-term reproducibility through lock files

One of pixi's most powerful features is its automatic lock file generation. This solves a critical problem that plagued the conda ecosystem: the environment you solve today will be the environment you solve for one year, two years, even three years from now.

The conda problem

In the old conda world, lock files weren't auto-generated by default. You had two options: either manually run conda env export to create them (a step that was often forgotten or overlooked), or use the conda-lock package as an additional tool. Without proper lock files, environments would drift over time as package versions changed, leading to the classic "it works on my machine" problem.

The pixi solution

Pixi automatically generates and maintains lock files (pixi.lock) every time you modify your environment. This means:

Automatic reproducibility: Every environment change is immediately locked
No manual steps: No need to remember to run extra commands
Future-proof environments: The exact same environment can be recreated years later
Team consistency: Everyone gets identical environments, eliminating "works on my machine" issues

This automation is absolutely necessary for reproducible data science. When you're working on projects that might be revisited months or years later, or when collaborating with team members across different machines, the lock file ensures everyone is working with identical dependencies.

Composable multi-environment projects

Pixi's biggest strength lies in its composable approach to environment management. Rather than creating separate environments for each purpose, pixi lets you define reusable "features" that can be combined into different environments. Here's how it works in practice:

Features as building blocks

In your pyproject.toml, you define features as reusable components:

[tool.pixi.feature.tests.dependencies]
pytest = "*"
pytest-cov = "*"
hypothesis = "*"

[tool.pixi.feature.docs.dependencies]
mkdocs = "*"
mkdocs-material = "*"
mknotebooks = "*"

[tool.pixi.feature.notebook.dependencies]
ipykernel = "*"
ipython = "*"
jupyter = "*"
pixi-kernel = "*"

[tool.pixi.feature.devtools.dependencies]
pre-commit = "*"

Composing environments

You can then compose these features into different environments based on your needs:

[tool.pixi.environments]
default = { features = ["tests", "devtools", "notebook", "setup"] }
docs = { features = ["docs"] }
tests = { features = ["tests", "setup"] }
cuda = { features = ["tests", "devtools", "notebook", "setup", "cuda"] }

Advantages of this approach

What are the advantages of this approach? Here are a few that I think are worth highlighting:

Clarity of Purpose: Each feature clearly defines dependencies for a specific purpose (testing, documentation, development), making the project structure more understandable.
Minimal Environments: You can create lean environments that only include what's needed. For example, CI/CD pipelines can use the tests environment without documentation or notebook dependencies, potentially reducing the size of the environment and speeding up builds.
Flexibility with hardware: You can easily add in features that require specific hardware, such as CUDA or M1 Macs.
Reproducibility: Each environment is explicitly defined through its component features, making it easier to reproduce environments across different machines.

Example use cases

Let's look at some practical examples of how to use these different environments in your daily workflow.

For running tests, you can use the minimal test environment which only includes testing dependencies:

pixi run -e tests pytest

For building documentation, you can use the docs environment which only includes documentation dependencies:

pixi run -e docs mkdocs build

For developing with all tools available, you can use the default environment which includes all development tools:

pixi shell  # Uses default environment

And finally, for running CUDA-enabled notebooks, you can use the CUDA environment which includes CUDA support:

pixi shell -e cuda