You should always know the source of truth
There should be one, and preferably only one, obvious source of truth for things.
This philosophy is a play on the Zen of Python, in which one line states:
There should be one -- and preferably only one -- obvious way to do it.
Part of how we manage complexity in our projects is by distilling things down to single sources of truth. By defining single sources of truth, we avoid the hairy situation of being unsure which version or copy of a thing we ought to depend on. This has the effect of minimizing confusion, and even potential conflicts, downstream.
As you go through the knowledge base, you will see this philosophy at play in
how we structure and organize our individual projects. In everyday work it also
shows up as two complementary artifacts worth treating carefully: operational
configs that tools read, and AGENTS.md, which captures how coding agents ought
to help.
Repository standards for tools and agents
Operational files are the source of truth for your tools: formatter settings,
lint rules, CI workflows, ignored paths, and environment contracts. Humans and
automation should read one committed file (pyproject.toml,
.pre-commit-config.yaml, and so on) instead of folklore in chat.
AGENTS.md is the source of truth for how an AI coding agent
should behave inside that repo. Instead of restating coding style, how to invoke
tests, or which commands to prefer in every session, you document them once in
building repository memory with AGENTS.md.
If it isn't in AGENTS.md, you cannot expect an agent to follow it reliably.
Neither layer replaces the other. Committed configs say what must be true in the codebase; agent memory says how collaborators (including bots) navigate and apply those truths day to day.
See this philosophy in action
Single source of truth isn't abstract; it is visible in every well-organized project I have worked on. The notes below roughly follow how I think about stack construction: clarify where code lives, wire tools and checks so they disagree with nothing upstream, then make data access and environment contracts equally explicit.
Code organization and structure
The source code organization page shows how to evolve from scattered notebooks to maintainable code while keeping clear sources of truth. Instead of having multiple versions of the same analysis scattered across different notebooks, you create a single, authoritative module that others can import and depend on.
Repository structure and version control
A well-structured repository carries the same idea at the filesystem level: code, docs, config, and metadata land in predictable locations. Once that scaffolding is boring, newcomers stop asking which fork of the project layout they are joining.
Configuration consolidation
With structure in place, tool configuration deserves the same uniformity. See how
the
configuration files
guide keeps settings in predictable files; for example, consolidating Python
tooling via pyproject.toml where that fits your stack. Fewer surprises when
something breaks means less archaeology.
Pre-commit parity between laptop and CI
Treat .pre-commit-config.yaml as the canonical
definition of formatting and sanity checks before a commit lands. Locally you
run hooks with pre-commit; in CI you should run the same hook set from the
committed config (for example via pixi run pre-commit run --all-files, as we
document at the repo level in AGENTS.md), not a bespoke copy of the checks
that silently drifts. That way a green laptop and a green pipeline mean the same
thing.
CI/CD
carries automation beyond commits, still pinned to definitions you checked in.
Data access functions
The data catalog page shows how I replace scattered
loading code with centralized functions. One load_customer_data() becomes the
authoritative way to access that dataset across notebooks and scripts. When the
upstream path or format changes, you update one implementation instead of chasing
twenty copy-pasted snippets.
Repeatable paths for data and artifacts
Paths themselves deserve the same discipline as accessors. Prefer resolving the
project root in code (pyprojroot, pathlib) and documenting path-shaped
settings in .env with a checked-in .env.example, as in
environment variables for a project.
Pair that mindset with catalog functions so callers import behavior, not string
literals sprinkled through the tree. For where bytes actually live versus what
gets versioned in git,
choosing data formats
keeps storage and repo concerns from blurring together.
Environment variable management
Once you stop hard-coding filesystem guesses, secrets and toggles remain the next
easy place for drift.
The environment variables walkthrough keeps one .env
(and a template collaborators can mirror) aligned with loaders such as
python-dotenv, so you quit asking which incarnation of DATABASE_URL attached
to the shell you spawned.
Dotfiles as central configuration
Across machines your dotfiles become one more shared truth: baseline shell defaults that follow you wherever you clone. That is not about erasing personality; it is about making repeatable behavior ordinary.
The payoff stays simple: wherever you reach for an answer, authoritative code, paths, tooling, secrets, automation, shell defaults, even agent etiquette, already points at a single deliberate owner.