Tests-Enabled Science

written by Eric J. Ma on 2016-03-15

In the software development world, I learned about the importance of writing tests for one’s software. Since then, I have incorporated this habit in my own work, where as part of my more recent work, I write tests for the software I write to conduct scientific research.

This has got me thinking about why tests are such an effective tool. I think there’s got to be at least a few reasons.

Tests effectively are a contract between current and future selves. My current self is writing a contract that has to be enforced by changes made by my future self. If future self makes changes that break the contract, tests will catch them. Of course, this all assumes that the conditions under which the contract was written still hold true. If they change, then the contract can be broken.
Tests force my current self to be explicit about what exactly I am writing. This is much better than slapping together a script in a quick-and-dirty fashion. (Granted, there is a time and place for quick-and-dirty scripts.)
When done with (semi-)automatic testing frameworks, such as py.test, the test suite is automatically run from start to finish. This reduces the cognitive load of running the tests one-by-one, and reinforces the cohesiveness of the code logic.

Okay, so what exactly do I mean by tests? Here’s a few thoughts.

Data integrity tests. By this, I mean tests that are situated in the directory where the data reside, that encode known properties of the data. For example:
1. Number of rows.
2. Number of columns.
3. The column names.
4. The hash of each row of data.
5. The hash of each column of data.
Unit tests. By this, I mean tests that are written that ensure that a function does exactly what it’s expected to do. This is the most common concept of a test. This is applicable to:
1. Code that are written to manipulate data.
2. Code written that are part of a software package or software utility developed.

While it takes time, I think computational scientists should write tests for their code as a matter of routine practice. Not sure if good test writing is enforceable, but it should definitely be done more.

Cite this blog post:

@article{
    ericmjl-2016-tests-enabled-science,
    author = {Eric J. Ma},
    title = {Tests-Enabled Science},
    year = {2016},
    month = {03},
    day = {15},
    howpublished = {\url{https://ericmjl.github.io}},
    journal = {Eric J. Ma's Blog},
    url = {https://ericmjl.github.io/blog/2016/3/15/tests-enabled-science},
}

I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.

If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!

Finally, I do free 30-minute GenAI strategy calls for teams that are looking to leverage GenAI for maximum impact. Consider booking a call on Calendly if you're interested!

Eric J Ma's Website