Store your project documentation in your project repository
Documentation is a crucial aspect of any software project, serving as a bridge between code and its users, maintainers, and future developers.
Well-written documentation reduces the learning curve for new team members, minimizes the time spent answering repetitive questions, and helps preserve institutional knowledge. It acts as both a technical reference and a historical record, capturing not just how code works but why certain decisions were made.
Without proper documentation, projects can become difficult to maintain, knowledge can be lost when team members leave, and collaboration becomes more challenging.
By investing time in documentation early and maintaining it consistently, we create more sustainable and accessible projects that can evolve and grow with our teams.
Where should you store your documentation?
I advocate for keeping your documentation within the code repository, as the code repository is the single source of truth for your project.
By storing your documentation close to your code, you reduce the chances that your documentation will become stale. And by storing your documentation close to code, you enable a whole host of other features and tools that can make your life much easier.
Introduction to the Diataxis framework
The Diataxis framework is a structured approach to documentation that categorizes content into four distinct types: tutorials, how-to guides, explanations, and reference. Each type serves a unique purpose and audience, ensuring that documentation is comprehensive and accessible. This framework helps writers focus on the specific needs of their audience, whether they are beginners looking for step-by-step instructions or experienced users seeking in-depth understanding.
When choosing which type of documentation to write, consider your audience's needs:
- Tutorials: For newcomers learning the basics
- How-to guides: For users trying to solve specific problems
- Explanations: For those seeking deeper understanding
- Reference: For users needing technical details
When to add documentation
The best time to add documentation to your project is while things are fresh in your head. This ensures that the information is accurate and complete. Additionally, documentation should be close to the code. Keeping documentation alongside the source code provides a single source of truth, making it easier for developers and stakeholders to find and update information.
Some key moments to write documentation:
- Right after implementing a new feature
- When fixing a complex bug (document the root cause and solution)
- During code review, when explaining design decisions
- While writing tests, as test cases often make excellent documentation examples
Code comments as documentation
Adding plenty of comments to your code is a valuable practice. Comments can serve as a form of documentation that AI tools, such as language models, can use to generate more detailed narrative documentation. This practice not only aids in understanding the code but also facilitates the creation of comprehensive documentation.
Good commenting practices:
- Explain the "why" rather than the "what"
- Document assumptions and edge cases
- Include examples for complex algorithms
- Reference related documentation or tickets
- Add type hints and docstrings for better AI comprehension
AI assistance in documentation
When using AI assistance in coding, leverage configuration files to ensure AI-generated code always comes with proper documentation. This approach ensures consistency and reduces the manual effort of adding documentation after the fact.
AI Configuration Files
Store your AI instructions in configuration files that are part of your standard project structure:
GitHub Copilot Instructions (.github/copilot-instructions.md
):
# Coding Standards
- Always add type hints to function parameters
- Write docstrings in Sphinx format for all public functions
- Include examples in docstrings for complex functions
- Document edge cases and error handling
- Create unit tests for new functions in tests/ directory
- Follow functional programming principles where possible
Cursor Rules (.cursorrules
):
Docstrings should be in Sphinx format.
Testing framework should be pytest.
Type hints should be required.
Always document the "why" not just the "what" in comments.
Include examples for complex algorithms.
Claude Instructions (CLAUDE.md
):
# Claude Configuration for Code Documentation
When writing or modifying code:
1. Always include comprehensive docstrings in Sphinx format
2. Add inline comments explaining complex logic
3. Document assumptions and edge cases
4. Include type hints for all function parameters
5. Write accompanying tests with clear test names
6. Explain the reasoning behind design decisions
The Developer's Role
You as the developer should always review AI-generated documentation for correctness. While AI can help generate initial documentation, it's your responsibility to:
- Verify technical accuracy
- Ensure examples work correctly
- Check that the documentation matches the actual implementation
- Add context that AI might have missed
- Update documentation when requirements change
How Standard Project Structure Supports Documentation
The standard project structure outlined in Start with a sane repository structure supports documentation practices by providing a docs/
directory right from the beginning. This structure includes:
project/
├── docs/ # Dedicated documentation directory
│ ├── index.md # Main documentation entry point
│ ├── api.md # API reference documentation
│ └── ...
├── README.md # Project overview and quick start
├── mkdocs.yaml # Documentation configuration
└── ...
This structure ensures that: - Documentation has a dedicated, organized space - Documentation tools are configured from day one - The project follows documentation best practices from the start - AI-generated documentation has a clear place to live
Reference documentation
The fourth type of documentation, reference, typically includes API reference guides. There are numerous tools available to assist with creating reference documentation:
Python-specific tools
- Sphinx: The standard for Python projects
- pdoc: Automatic API documentation generator
- mkdocs: A static site generator that's great for building documentation websites.
Pairs well with
mkdocs-material
, which this book is written in. - doctest: A tool for testing code examples in documentation
This e-book is written in mkdocs
using the mkdocs-material
theme,
which thus should give you a feel for my preferred documentation tooling.
API Documentation
- Swagger/OpenAPI: For REST API documentation
- GraphQL Schema Documentation: For GraphQL APIs
- TypeDoc: For TypeScript/JavaScript projects
Whatever your organization uses, make sure the relevant configuration files for these tools are included in your team's standard project structure.
Automating documentation with CI/CD
To keep documentation both close to the code and easily accessible, you can use CI/CD pipelines to automatically publish documentation from your repository to a more accessible location, such as Confluence. After all, your documentation may be important for non-developers to understand your project, and it's likely that for cost reasons, they won't have subscription access to your version control system.
Here is an example GitHub Actions workflow for md2cf
:
name: Publish Docs to Confluence
on:
push:
branches: [main]
paths:
- 'docs/**'
jobs:
publish:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- name: Publish to Confluence
run: |
uvx md2cf \
--space "TEAM" \
--parent "Project Docs" \
--update-existing \
docs/
env:
CONFLUENCE_URL: ${{ secrets.CONFLUENCE_URL }}
CONFLUENCE_TOKEN: ${{ secrets.CONFLUENCE_TOKEN }}
Here are some additional ways you can think about automation:
- Set up pre-commit hooks to check documentation formatting
- Use documentation testing tools to verify code examples (e.g.
doctest
) - Configure automated broken link checking using the markdown-link-check GitHub action.
By following these practices and leveraging the Diataxis framework, you can create documentation that is not only informative but also accessible and easy to maintain. Remember that good documentation evolves with your code and should be treated as a first-class citizen in your data science project development process.