Configuration files guide
If you've ever been confused by the proliferation of configuration files in modern Python projects, you're not alone! I've lost count of how many times I've wondered "which file does what again?" when setting up a new project. This guide will help you understand the purpose of each configuration file and how they work together to make your development workflow smoother.
Why do we even need all these config files?
Here's what I've learned after working with countless data science projects: configuration files are your secret weapon for creating a consistent, professional development environment. They serve to:
- Standardize your development environment across team members (no more "it works on my machine!")
- Automate the tedious stuff like code formatting and quality checks
- Document your project's dependencies and settings for future you
- Streamline build, test, and deployment processes
The beauty of this approach is that once you set them up correctly, they work silently in the background, letting you focus on the actual data science work.
What are the core configuration files you should know about?
pyproject.toml
- Your project's single source of truth
This is my favorite development in Python packaging in recent years! pyproject.toml
is the modern standard that can replace multiple legacy config files. Think of it as your project's central nervous system:
[build-system]
requires = ["setuptools>=61.0", "wheel"]
build-backend = "setuptools.build_meta"
[project]
name = "your-project"
version = "0.1.0"
description = "Your project description"
authors = [{name = "Your Name", email = "your@email.com"}]
dependencies = [
"pandas>=1.0",
"numpy>=1.20",
]
[tool.black]
line-length = 88
target-version = ['py38']
[tool.isort]
profile = "black"
multi_line_output = 3
[tool.pylint.messages_control]
disable = ["missing-docstring", "invalid-name"]
[tool.interrogate]
ignore-init-method = true
ignore-module = true
fail-under = 80
What it does: Centralizes configuration for project metadata, dependencies, and development tools Should you commit it?: ✅ Always! This is core project infrastructure.
Consolidation is beautiful
I love how pyproject.toml
lets me configure black, isort, pylint, and pytest all in one place. No more hunting through separate .flake8
, .isort.cfg
, and other scattered files!
pixi.toml
- Modern environment management
I've found pixi
to be a game-changer for data science workflows. It handles both your environment and task automation beautifully:
[project]
name = "your-project"
version = "0.1.0"
description = "Your project description"
channels = ["conda-forge", "bioconda"]
platforms = ["osx-64", "linux-64", "win-64"]
[dependencies]
python = ">=3.8"
pandas = ">=1.0"
numpy = ">=1.20"
[pypi-dependencies]
your-package = ">=0.1.0"
[tasks]
test = "pytest tests/"
lint = "pylint src/"
format = "black src/"
What it does: Environment and dependency management, plus task automation Should you commit it?: ✅ Absolutely! This is how your team reproduces your environment.
Why I love pixi tasks
Instead of remembering complex command combinations, I can just run pixi run test
or pixi run lint
. It's like having a personal assistant for your development workflow!
.gitignore
- Keep the noise out
This file prevents you from accidentally committing files that shouldn't be in version control. Here's the template I use for data science projects:
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
env/
venv/
ENV/
env.bak/
venv.bak/
# Data files - crucial for data science!
*.csv
*.tsv
*.xlsx
*.json
*.parquet
data/
results/
# IDE
.vscode/
.idea/
*.swp
*.swo
# OS
.DS_Store
Thumbs.db
# Jupyter
.ipynb_checkpoints/
# Distribution
dist/
build/
*.egg-info/
What it does: Prevents sensitive, generated, or personal files from being committed Should you commit it?: ✅ Yes! This protects your whole team from common mistakes.
Don't commit your data!
I cannot stress this enough: never commit actual data files to git. Your future self (and your colleagues) will thank you when the repository doesn't balloon to gigabytes.
.pre-commit-config.yaml
- Your quality assurance robot
This is like having a tireless assistant that checks your work before every commit. I've set this up on every project I work on:
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- repo: https://github.com/psf/black
rev: 22.3.0
hooks:
- id: black
- repo: https://github.com/pycqa/isort
rev: 5.12.0
hooks:
- id: isort
- repo: https://github.com/pycqa/flake8
rev: 4.0.1
hooks:
- id: flake8
What it does: Automatically enforces code quality standards before you commit Should you commit it?: ✅ Always! This ensures consistent code quality across your team.
The magic here is that it runs automatically - no need to remember to format your code or check for issues. It just happens!
What about documentation configuration?
mkdocs.yml
- Beautiful documentation made easy
If you're writing documentation (and you should be!), MkDocs makes it painless:
site_name: Your Project Documentation
site_description: Project documentation
theme:
name: material
palette:
primary: blue
accent: light blue
nav:
- Home: index.md
- Installation: installation.md
- Usage: usage.md
- API Reference: api.md
plugins:
- search
- mkdocstrings
markdown_extensions:
- admonition
- codehilite
- pymdownx.superfences
What it does: Configures how your documentation site is generated Should you commit it?: ✅ Yes! Documentation configuration is part of your project.
How do I handle environment variables?
.env
- Your local secrets (never commit!)
DATABASE_URL=sqlite:///./test.db
API_KEY=your-dev-api-key
DEBUG=True
What it does: Stores development environment variables Should you commit it?: ⛔️ Never! This contains secrets and local configuration.
.env.example
- The template approach
DATABASE_URL=sqlite:///./app.db
API_KEY=your-api-key-here
DEBUG=False
What it does: Documents what environment variables are needed Should you commit it?: ✅ Yes! This helps team members know what to configure.
Template pattern
I always create an .env.example
file that shows the structure of required environment variables without exposing actual secrets. New team members can copy it to .env
and fill in their own values.
What's the quick reference for which tools use which files?
Tool | Configuration File | Commit to Git? |
---|---|---|
black |
pyproject.toml |
✅ |
isort |
pyproject.toml |
✅ |
pylint |
pyproject.toml |
✅ |
pytest |
pyproject.toml |
✅ |
pixi |
pixi.toml |
✅ |
git |
.gitignore |
✅ |
pre-commit |
.pre-commit-config.yaml |
✅ |
mkdocs |
mkdocs.yml |
✅ |
Environment | .env |
⛔️ |
Env Template | .env.example |
✅ |
What are my best practices for configuration files?
1. Embrace consolidation
Use pyproject.toml
for as many tools as possible to reduce config file proliferation:
# Everything in one place!
[tool.black]
line-length = 88
[tool.isort]
profile = "black"
[tool.pytest.ini_options]
testpaths = ["tests"]
2. Start with a template
I've learned this the hard way: it's much easier to start with a complete set of configuration files than to add them piecemeal as you remember you need them. Use a cookiecutter template or copy from a well-configured project.
3. Document your choices
Add comments explaining non-obvious settings. Future you will be grateful:
[tool.black]
line-length = 88 # PEP 8 recommends 79, but 88 works better with modern screens
target-version = ['py38'] # Minimum Python version we support
4. Be intentional about version control
Here's my simple rule:
- Always commit: Files that standardize development and help your team
- Never commit: Files containing secrets or personal preferences
- Use templates: For environment variables, create an
.example
version
5. Validate your configuration
Use pre-commit hooks to catch configuration errors before they cause problems:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-yaml
- id: check-toml
How do I get started with all this?
Here's what I've found works best:
- Start with a template: Use cookiecutter or pyds-cli to scaffold a project with all the essential configuration files already set up
- Customize gradually: Don't try to perfect everything at once. Modify the defaults to match your specific project needs as you discover them
- Document your changes: When you modify configuration, add a comment explaining why
- Automate validation: Set up pre-commit hooks to catch configuration errors early
The goal isn't perfection from day one - it's having a solid foundation that grows with your project. Time will distill the best practices for your specific context.
Remember: these configuration files are there to serve you, not the other way around. Start simple, and add complexity only when you feel the pain of not having it!