Skip to content

Configuration files guide

If you've ever been confused by the proliferation of configuration files in modern Python projects, you're not alone! I've lost count of how many times I've wondered "which file does what again?" when setting up a new project. This guide will help you understand the purpose of each configuration file and how they work together to make your development workflow smoother.

Why do we even need all these config files?

Here's what I've learned after working with countless data science projects: configuration files are your secret weapon for creating a consistent, professional development environment. They serve to:

  • Standardize your development environment across team members (no more "it works on my machine!")
  • Automate the tedious stuff like code formatting and quality checks
  • Document your project's dependencies and settings for future you
  • Streamline build, test, and deployment processes

The beauty of this approach is that once you set them up correctly, they work silently in the background, letting you focus on the actual data science work.

What are the core configuration files you should know about?

pyproject.toml - Your project's single source of truth

This is my favorite development in Python packaging in recent years! pyproject.toml is the modern standard that can replace multiple legacy config files. Think of it as your project's central nervous system:

[build-system]
requires = ["setuptools>=61.0", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "your-project"
version = "0.1.0"
description = "Your project description"
authors = [{name = "Your Name", email = "your@email.com"}]
dependencies = [
    "pandas>=1.0",
    "numpy>=1.20",
]

[tool.black]
line-length = 88
target-version = ['py38']

[tool.isort]
profile = "black"
multi_line_output = 3

[tool.pylint.messages_control]
disable = ["missing-docstring", "invalid-name"]

[tool.interrogate]
ignore-init-method = true
ignore-module = true
fail-under = 80

What it does: Centralizes configuration for project metadata, dependencies, and development tools Should you commit it?: ✅ Always! This is core project infrastructure.

Consolidation is beautiful

I love how pyproject.toml lets me configure black, isort, pylint, and pytest all in one place. No more hunting through separate .flake8, .isort.cfg, and other scattered files!

pixi.toml - Modern environment management

I've found pixi to be a game-changer for data science workflows. It handles both your environment and task automation beautifully:

[project]
name = "your-project"
version = "0.1.0"
description = "Your project description"
channels = ["conda-forge", "bioconda"]
platforms = ["osx-64", "linux-64", "win-64"]

[dependencies]
python = ">=3.8"
pandas = ">=1.0"
numpy = ">=1.20"

[pypi-dependencies]
your-package = ">=0.1.0"

[tasks]
test = "pytest tests/"
lint = "pylint src/"
format = "black src/"

What it does: Environment and dependency management, plus task automation Should you commit it?: ✅ Absolutely! This is how your team reproduces your environment.

Why I love pixi tasks

Instead of remembering complex command combinations, I can just run pixi run test or pixi run lint. It's like having a personal assistant for your development workflow!

.gitignore - Keep the noise out

This file prevents you from accidentally committing files that shouldn't be in version control. Here's the template I use for data science projects:

# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
env/
venv/
ENV/
env.bak/
venv.bak/

# Data files - crucial for data science!
*.csv
*.tsv
*.xlsx
*.json
*.parquet
data/
results/

# IDE
.vscode/
.idea/
*.swp
*.swo

# OS
.DS_Store
Thumbs.db

# Jupyter
.ipynb_checkpoints/

# Distribution
dist/
build/
*.egg-info/

What it does: Prevents sensitive, generated, or personal files from being committed Should you commit it?: ✅ Yes! This protects your whole team from common mistakes.

Don't commit your data!

I cannot stress this enough: never commit actual data files to git. Your future self (and your colleagues) will thank you when the repository doesn't balloon to gigabytes.

.pre-commit-config.yaml - Your quality assurance robot

This is like having a tireless assistant that checks your work before every commit. I've set this up on every project I work on:

repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.4.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: check-yaml
      - id: check-added-large-files

  - repo: https://github.com/psf/black
    rev: 22.3.0
    hooks:
      - id: black

  - repo: https://github.com/pycqa/isort
    rev: 5.12.0
    hooks:
      - id: isort

  - repo: https://github.com/pycqa/flake8
    rev: 4.0.1
    hooks:
      - id: flake8

What it does: Automatically enforces code quality standards before you commit Should you commit it?: ✅ Always! This ensures consistent code quality across your team.

The magic here is that it runs automatically - no need to remember to format your code or check for issues. It just happens!

What about documentation configuration?

mkdocs.yml - Beautiful documentation made easy

If you're writing documentation (and you should be!), MkDocs makes it painless:

site_name: Your Project Documentation
site_description: Project documentation

theme:
  name: material
  palette:
    primary: blue
    accent: light blue

nav:
  - Home: index.md
  - Installation: installation.md
  - Usage: usage.md
  - API Reference: api.md

plugins:
  - search
  - mkdocstrings

markdown_extensions:
  - admonition
  - codehilite
  - pymdownx.superfences

What it does: Configures how your documentation site is generated Should you commit it?: ✅ Yes! Documentation configuration is part of your project.

How do I handle environment variables?

.env - Your local secrets (never commit!)

DATABASE_URL=sqlite:///./test.db
API_KEY=your-dev-api-key
DEBUG=True

What it does: Stores development environment variables Should you commit it?: ⛔️ Never! This contains secrets and local configuration.

.env.example - The template approach

DATABASE_URL=sqlite:///./app.db
API_KEY=your-api-key-here
DEBUG=False

What it does: Documents what environment variables are needed Should you commit it?: ✅ Yes! This helps team members know what to configure.

Template pattern

I always create an .env.example file that shows the structure of required environment variables without exposing actual secrets. New team members can copy it to .env and fill in their own values.

What's the quick reference for which tools use which files?

Tool Configuration File Commit to Git?
black pyproject.toml
isort pyproject.toml
pylint pyproject.toml
pytest pyproject.toml
pixi pixi.toml
git .gitignore
pre-commit .pre-commit-config.yaml
mkdocs mkdocs.yml
Environment .env ⛔️
Env Template .env.example

What are my best practices for configuration files?

1. Embrace consolidation

Use pyproject.toml for as many tools as possible to reduce config file proliferation:

# Everything in one place!
[tool.black]
line-length = 88

[tool.isort]
profile = "black"

[tool.pytest.ini_options]
testpaths = ["tests"]

2. Start with a template

I've learned this the hard way: it's much easier to start with a complete set of configuration files than to add them piecemeal as you remember you need them. Use a cookiecutter template or copy from a well-configured project.

3. Document your choices

Add comments explaining non-obvious settings. Future you will be grateful:

[tool.black]
line-length = 88  # PEP 8 recommends 79, but 88 works better with modern screens
target-version = ['py38']  # Minimum Python version we support

4. Be intentional about version control

Here's my simple rule:

  • Always commit: Files that standardize development and help your team
  • Never commit: Files containing secrets or personal preferences
  • Use templates: For environment variables, create an .example version

5. Validate your configuration

Use pre-commit hooks to catch configuration errors before they cause problems:

- repo: https://github.com/pre-commit/pre-commit-hooks
  rev: v4.4.0
  hooks:
    - id: check-yaml
    - id: check-toml

How do I get started with all this?

Here's what I've found works best:

  1. Start with a template: Use cookiecutter or pyds-cli to scaffold a project with all the essential configuration files already set up
  2. Customize gradually: Don't try to perfect everything at once. Modify the defaults to match your specific project needs as you discover them
  3. Document your changes: When you modify configuration, add a comment explaining why
  4. Automate validation: Set up pre-commit hooks to catch configuration errors early

The goal isn't perfection from day one - it's having a solid foundation that grows with your project. Time will distill the best practices for your specific context.

Remember: these configuration files are there to serve you, not the other way around. Start simple, and add complexity only when you feel the pain of not having it!