Use pixi for maximally ergonomic and reproducible environments
pixi
In the first edition of this book, I recommended one conda environment per project. Since then, the tooling has evolved and now I recommend one pixi
configuration per project, where one pixi
configuration can support multiple environments within one project. More on that distinction within this chapter.
Having used it for a while now, I recommend managing your projects using pixi
. pixi
is an environment management multi-tool that enables you to manage your project's software environment reproducibly and ergonomically.
For the hybrid data scientist and tool developer persona, I wrote a blog post that detailed what I learned about switching from the use of conda
to pixi
. I won't repeat it in its entirety here, but will instead reproduce (🥁) the essentials, amounting to a cheat sheet of sorts.
The Cheat Sheet of pixi commands
Install/update pixi
curl -fsSL https://pixi.sh/install.sh | bash # install pixi
echo 'eval "$(pixi completion --shell zsh)"' >> ~/.zshrc # enable auto-completion
echo 'eval "$(pixi completion --shell bash)"' >> ~/.bashrc # enable auto-completion
pixi self-update # self-update
The commands should be pretty self-explanatory!
Initialize a project
pyds project init # scaffolds project with cookiecutter and pixi
When starting a new project, I use pyds-cli
to scaffold out the environment. Underneath the hood, the pyds project init
command uses cookiecutter
to scaffold out a new project and then calls on pixi
to install the environment.
pixi init --format pyproject # quick initialization for prototyping
Alternatively, if one only needs to quickly initialize a scratch environment for prototyping, one can just do pixi init
and start from there.
Add in a new dependency
pixi add "package name"
pixi add --pypi "pypi package name"
pixi add -f "feature name"
When one wants to add in a new dependency, we use pixi add
to add it to the environment specification. If you have a package that needs to be pulled from pypi
, then add the --pypi
flag. If the package needs to be added to just one feature (such as a cuda
-specific feature), then use the -f
flag and specify the feature name, e.g. cuda
.
Conda-like shell activation
pixi shell
pixi shell -e "environment-name"
Sometimes, one needs a shell that has access to the Python/IPython interpreter that is associated with your environment. You can make this happen by running pixi shell
, with an optional environment to specify (-e env-name
).
Run tasks and programs
pixi run task-name # as specified in your `pyproject.toml`
pixi run -e docs quarto preview # run `quarto preview` within the docs environments
pixi run python # run Python within your project's default environment.
Because pixi
enables one to replace Makefiles with tasks defined in pyproject.toml
, it is possible to run tasks through aliases, but also run commands (such as quarto preview
) within a user-defined pixi environment for the project.
Composable multi-environment projects
Pixi's biggest strength lies in its composable approach to environment management. Rather than creating separate environments for each purpose, pixi lets you define reusable "features" that can be combined into different environments. Here's how it works in practice:
Features as building blocks
In your pyproject.toml
, you define features as reusable components:
[tool.pixi.feature.tests.dependencies]
pytest = "*"
pytest-cov = "*"
hypothesis = "*"
[tool.pixi.feature.docs.dependencies]
mkdocs = "*"
mkdocs-material = "*"
mknotebooks = "*"
[tool.pixi.feature.notebook.dependencies]
ipykernel = "*"
ipython = "*"
jupyter = "*"
pixi-kernel = "*"
[tool.pixi.feature.devtools.dependencies]
pre-commit = "*"
Composing environments
You can then compose these features into different environments based on your needs:
[tool.pixi.environments]
default = { features = ["tests", "devtools", "notebook", "setup"] }
docs = { features = ["docs"] }
tests = { features = ["tests", "setup"] }
cuda = { features = ["tests", "devtools", "notebook", "setup", "cuda"] }
Advantages of this approach
What are the advantages of this approach? Here are a few that I think are worth highlighting:
- Clarity of Purpose: Each feature clearly defines dependencies for a specific purpose (testing, documentation, development), making the project structure more understandable.
- Minimal Environments: You can create lean environments that only include what's needed. For example, CI/CD pipelines can use the
tests
environment without documentation or notebook dependencies, potentially reducing the size of the environment and speeding up builds. - Flexibility with hardware: You can easily add in features that require specific hardware, such as CUDA or M1 Macs.
- Reproducibility: Each environment is explicitly defined through its component features, making it easier to reproduce environments across different machines.
Example use cases
Let's look at some practical examples of how to use these different environments in your daily workflow.
For running tests, you can use the minimal test environment which only includes testing dependencies:
pixi run -e tests pytest
For building documentation, you can use the docs environment which only includes documentation dependencies:
pixi run -e docs mkdocs build
For developing with all tools available, you can use the default environment which includes all development tools:
pixi shell # Uses default environment
And finally, for running CUDA-enabled notebooks, you can use the CUDA environment which includes CUDA support:
pixi shell -e cuda