Manage and configure your projects
Once you've got your machine set up, it's now time to learn how to manage and configure your projects. In this section, I will show you sane patterns for managing your projects' codebase with minimal overhead.
Follow the 1:1:1:1... rule
The 1:1:1:1... rule essentially means this. Each project that we work on gets:
- One Git repository
- One pixi configuration
- One custom source package inside the repo
- One documentation source inside the repo, including a well-maintained README file
- One standard set of continuous integration pipelines
- One data catalog that documents the datasets associated with your project and how they are accessed, documented and accessed via your source package.
- One set of configuration files that standardize development tools and practices across your team.
In addition, when we name things, such as environment names, repository names, and more, we choose names that are consistent with one another. I explain why in this chapter. But more generally, conventions serve as cognitive scaffolding: they provide a shared mental model that helps us collaborate more effectively. Adopting the convention of one-to-one mappings helps us manage away many areas of complexity that may arise in a project.
When can we break this rule
While the 1:1:1:1... rule serves as an excellent default, there are legitimate cases where breaking it makes sense. Here are some guidelines to help you decide when:
Extracting reusable code into a separate package
When you notice that a module or set of functions in your source code could be useful across multiple projects, it may be time to break it out into its own package. Signs that this might be appropriate include:
- The code solves a general problem not specific to your project
- Other teams or projects could benefit from using it
- The code has minimal dependencies on the rest of your project
- The functionality is stable and well-tested
In these cases, work with more experienced data scientists who have software development experience to properly refactor the code into a standalone package that can be installed via pip/conda.
Splitting into multiple repositories
As projects grow, they sometimes reach a point where maintaining everything in one repository creates more complexity than it solves. Consider splitting the repository when:
- Different parts of the project evolve at very different rates
- Separate teams need to work independently on different components
- The codebase has grown so large that git operations are becoming slow
- There's a clear logical separation between components with minimal interdependencies
When you do split repositories, apply the same 1:1:1:1... principles to each new repository to maintain consistency and organization.
Combining multiple environments
While we generally recommend one pixi environment per project, there are cases where you might need multiple environments within a project:
- When you need both CPU and GPU versions of packages
- For testing against different versions of key dependencies
- For separating documentation building dependencies from core project dependencies
In these cases, use pixi's feature system to manage multiple environments while still maintaining them all within a single pixi configuration file.
Remember: The goal of the 1:1:1:1... rule is to reduce cognitive overhead and maintain clarity. Only break it when adhering to it would actually increase complexity rather than reduce it. If you know the rules well, you'll know when to break them!