Eric J Ma's Website

Always Check Your Data

written by Eric J. Ma on 2017-10-31 | tags: data science data analysis statistics bayesian

True story, just happened today. I was trying to fit a Poisson likelihood to estimate event cycle times (in discreet weeks). For certain columns, everything went perfectly fine. Yet for other columns, I was getting negative infinity’s likelihoods, and was banging my head over this problem for over an hour and a half.

As things turned out, those columns that gave me negative infinity likelihood initializations were doing so because of negative values in the data. Try fitting a Poisson likelihood, which only has positive support, on that!

This lost hour and a half was a good lesson in data checking/testing: always be sure to sanity check basic stats associated with the data - bounds (min/max), central tendency (mean/median/mode) and spread (variance, quartile range) - always check!

I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.

If you would like to receive deeper, in-depth content as an early subscriber, come support me on Patreon!