Eric J Ma's Website

Research code benefits from unit testing

written by Eric J. Ma on 2023-08-30 | tags: code review unit tests research chemistry ml chemistry data splitting property prediction software testing code correctness


Today, during a code review with one of our interns, I had an epiphany.

Research code can still benefit from unit tests.

Especially if that research code is going to be used in a head-to-head comparison of methodologies.

Let me explain.

Within the ML for chemistry world, random splits are downright unacceptable when constructing training, testing, and validation sets, particularly if one wants to claim that one's model generalizes beyond seen chemistry. (For a comprehensive introduction on how best to split data for molecular property prediction, I refer you to Pat Walter's blog, Practical Cheminformatics.

Without going into proprietary details, one of our datasets involved replicate measurements for each molecule tested. For reasons beyond my realm of influence, one of our external collaborators built a model that included all replicate measurements rather than doing a Bayesian estimation of property for those molecules before predicting the property from an ML model. The splitting strategy there was to split on molecule, so that we ensure that no molecules were represented in both the training set and test sets. In order to build a comparator model internally, one of our interns, Matthieu, wrote code to implement that splitting strategy, as the splitting code was not shared by our collaborators just yet.

During code review, Mattheiu, his direct supervisor, my teammate Zeran and myself all came to the realization that because of the non-standard nature of the splitting strategy, we needed to have guarantees that the code that Matthieu wrote was correct. As Matthieu voiced out ways to test the correctness of his code, it dawned upon me: as long as he did a refactor of the splitting code into a function, he'd have the target for a unit test!

Here, he could test certain properties of the splitter function. Firstly, given data that had replicate measurements for a molecule, he could test that in any random split tested, no molecule showed up in both the train and test sets. Secondly, he could test that the total number of rows of data, when re-combining the train set and the test set, equalled the original number. These were, by no means, the only two properties of the function that he could test.

And that basically brings me to the main thesis of this post: even research code, one that might be thrown away eventually, still can benefit from rigorous testing! Since this code was intended to be used in a bake-off between an internal effort and a collaborator's effort, we needed to be absolutely sure that the code did what we claimed it did, and there was no better way to have that guarantee than by having a unit test. That is, after all, the point of software tests: to find ways to prove the correctness of our code.


I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.

If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!

Finally, I do free 30-minute GenAI strategy calls for organizations who are seeking guidance on how to best leverage this technology. Consider booking a call on Calendly if you're interested!