Skip to content

You should always know the source of truth

There should be one, and preferably only one, obvious source of truth for things.

This philosophy is a play on the Zen of Python, in which one line states:

There should be one -- and preferably only one -- obvious way to do it.

Part of how we manage complexity in our projects is by distilling things down to single sources of truth. By defining single sources of truth, we avoid the hairy situation of being unsure which version or copy of a thing we ought to depend on. This has the effect of minimizing confusion, and even potential conflicts, downstream.

As you go through the knowledge base, you will see this philosophy at play in how we structure and organize our individual projects. In everyday work it also shows up as two complementary artifacts worth treating carefully: operational configs that tools read, and AGENTS.md, which captures how coding agents ought to help.

Repository standards for tools and agents

Operational files are the source of truth for your tools: formatter settings, lint rules, CI workflows, ignored paths, and environment contracts. Humans and automation should read one committed file (pyproject.toml, .pre-commit-config.yaml, and so on) instead of folklore in chat.

AGENTS.md is the source of truth for how an AI coding agent should behave inside that repo. Instead of restating coding style, how to invoke tests, or which commands to prefer in every session, you document them once in building repository memory with AGENTS.md. If it isn't in AGENTS.md, you cannot expect an agent to follow it reliably.

Neither layer replaces the other. Committed configs say what must be true in the codebase; agent memory says how collaborators (including bots) navigate and apply those truths day to day.

See this philosophy in action

Single source of truth isn't abstract; it is visible in every well-organized project I have worked on. The notes below roughly follow how I think about stack construction: clarify where code lives, wire tools and checks so they disagree with nothing upstream, then make data access and environment contracts equally explicit.

Code organization and structure

The source code organization page shows how to evolve from scattered notebooks to maintainable code while keeping clear sources of truth. Instead of having multiple versions of the same analysis scattered across different notebooks, you create a single, authoritative module that others can import and depend on.

Repository structure and version control

A well-structured repository carries the same idea at the filesystem level: code, docs, config, and metadata land in predictable locations. Once that scaffolding is boring, newcomers stop asking which fork of the project layout they are joining.

Configuration consolidation

With structure in place, tool configuration deserves the same uniformity. See how the configuration files guide keeps settings in predictable files; for example, consolidating Python tooling via pyproject.toml where that fits your stack. Fewer surprises when something breaks means less archaeology.

Pre-commit parity between laptop and CI

Treat .pre-commit-config.yaml as the canonical definition of formatting and sanity checks before a commit lands. Locally you run hooks with pre-commit; in CI you should run the same hook set from the committed config (for example via pixi run pre-commit run --all-files, as we document at the repo level in AGENTS.md), not a bespoke copy of the checks that silently drifts. That way a green laptop and a green pipeline mean the same thing. CI/CD carries automation beyond commits, still pinned to definitions you checked in.

Data access functions

The data catalog page shows how I replace scattered loading code with centralized functions. One load_customer_data() becomes the authoritative way to access that dataset across notebooks and scripts. When the upstream path or format changes, you update one implementation instead of chasing twenty copy-pasted snippets.

Repeatable paths for data and artifacts

Paths themselves deserve the same discipline as accessors. Prefer resolving the project root in code (pyprojroot, pathlib) and documenting path-shaped settings in .env with a checked-in .env.example, as in environment variables for a project. Pair that mindset with catalog functions so callers import behavior, not string literals sprinkled through the tree. For where bytes actually live versus what gets versioned in git, choosing data formats keeps storage and repo concerns from blurring together.

Environment variable management

Once you stop hard-coding filesystem guesses, secrets and toggles remain the next easy place for drift. The environment variables walkthrough keeps one .env (and a template collaborators can mirror) aligned with loaders such as python-dotenv, so you quit asking which incarnation of DATABASE_URL attached to the shell you spawned.

Dotfiles as central configuration

Across machines your dotfiles become one more shared truth: baseline shell defaults that follow you wherever you clone. That is not about erasing personality; it is about making repeatable behavior ordinary.

The payoff stays simple: wherever you reach for an answer, authoritative code, paths, tooling, secrets, automation, shell defaults, even agent etiquette, already points at a single deliberate owner.