Set up pre-commit hooks to automate checks before making git commits

## Why use pre-commit hooks?

One way to prevent yourself from committing code that is not properly checked is to use pre-commit hooks. This is a feature of Git that allows you to automatically run checks before they are committed to the repository history. Because they are automatically run, you set them up once, usually when you first download the repository, and no longer need to think about them again.

## How do I set up pre-commit hooks?

You can install the pre-commit framework, which lets you easily configure pre-commit hooks to run.

The gist of the installation steps are in the bash commands below, but you should read the website for a fuller understanding.

conda install -c conda-forge pre-commit
pre-commit sample-config > .pre-commit-config.yaml


Now, go and edit .pre-commit-config.yaml -- add other pre-commit checks, for example. (See below for an example that you can use.) Then, run:

pre-commit install
pre-commit run --all-files   # run the checks against all of your files


## What pre-commit hooks are good to install?

a.k.a. what would you put in your .pre-commit-config.yaml? Here's a sane collection of starter things that I usually include, taken from my Network Analysis Made Simple repository.

repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/psf/black
rev: 19.3b0
hooks:
- id: black
- repo: https://github.com/kynan/nbstripout
rev: master
hooks:
- id: nbstripout
files: ".ipynb"


nbstripout is a super important one -- it ensures that all of my notebook outputs are stripped before committing them to the repository! (Otherwise, you'll end up bloating your repository with large notebooks.)

## How does this relate to continuous integration pipeline checks?

(For a refresher, or if you're not sure what CI pipeline checks are, see Build a continuous integration pipeline for your source.)

CI pipeline checks are also a form of automated checks that you can put into your workflow. Ideally, everything that is checked for in your pre-commit hooks should be checked for in your CI pipeline.

So what's the difference, then? Here's my thoughts on this:

In pre-commit hooks, you generally run the lightweight checks: the ones that are annoying to run manually all the time but also execute very quickly. Things like code style checks, for example, or those that ensure there are only single trailing lines in text files.

In the CI system, you run those checks in addition to the longer-running test suite. (see: Write tests that test your custom code). So the CI system behaves as a backup to the pre-commit hooks.