Create configuration files for code checking tools

Why configure code checking tools using configuration files?

Configuration files give you the ability to declare your project's preferred configuration and distribute it to all participants in the project. It smooths out the practice of data science (and software development too), as these configuration represent the declared normative state of a project, answering questions such as:

  • What code style rules ought we adhere to?
  • How much docstring coverage ought to be present?
  • What software tests should be run?

Without these configuration files declaring how code checkers ought to behave, we leave it up to collaborators and contributors to manually configure their local systems, and without sufficient documentation, they may bug you over and over on how things ought to be configured. This increase in friction will inevitably lead to an increase in frustration with the project, and hence a decrease in engagement.

As such, you can think of these configuration files as part of your automation toolkit, thus satisfying the automation philosophy.

What configuration files belong with which code checking tools?

Because configuration files are so crucial to a project, I have collated them together on the Configuration file overview page.

When do I create these configuration files?

As always, just-in-time, at the moment that you need it.

Configuration file overview

What configuration files go with which tools?

The following table should help with disambiguating the question.

| Tool | Configuration File | Version Control? | | ------------- | ----------------------- |:----------------:| | black | pyproject.toml | ✅ | | isort | pyproject.toml | ✅ | | interrogate | pyproject.toml | ✅ | | pylint | pyproject.toml | ✅ | | conda | environment.yml | ✅ | | VSCode | .vscode/settings.json | ⛔️ | | pip | requirements.txt | ✅ | | MkDocs | mkdocs.yml | ✅ | | git | .gitignore | ✅ |

Get prepped per project

Treat your projects as if they were software projects for maximum organizational effectiveness. Why? The biggest reason is that it will nudge us towards getting organized. The "magic" behind well-constructed software projects is that someone sat down and thought clearly about how to organize things. The same principle can be applied to data analysis projects.

Firstly, some overall ideas to ground the specifics:

Some ideas pertaining to Git:

Notes that pertain to organizing files:

Notes that pertain to your compute environment:

And notes that pertain to good coding practices:

Treating projects as if they were software projects, but without software engineering's stricter practices, keeps us primed to think about the generalizability of what we do, but without the over-engineering that might constrain future flexibility.

Adhere to best git practices

Why adhere to best Git practices?

Git is a unique piece of software. It does one and only one thing well: store versions of hand-curated files. Adhering to Git best practices will ensure that you use Git in its intended fashion.

What best practices should we adhere to?

The most significant point to keep in mind: only commit to Git files that you have had to create manually. That usually means version controlling:

  1. Source code. See: Place custom source code inside a lightweight package)
  2. Configuration files. See:
    1. Create runtime environment variable configuration files for each of your projects
    2. Create configuration files for code checking tools
  3. Documentation. See: Write effective documentation for your projects

There are also things you should actively avoid committing.

For specific files, you can set up a .gitignore file. See the page Set up an awesome default gitignore for your projects for more information on preventing yourself from committing them automatically.

For Jupyter notebooks, it is considered good practice to avoid committing notebooks that still have outputs. It is best to clear them out using nbstripout. That can be automated before committing them through the use of pre-commit hooks. (See: Set up pre-commit hooks to automate checks before making Git commits)