I once saw a probability estimate with mean 0.8 and variance 0.3. From that point onwards, I knew frequentist estimates could be horribly wrong, and decided to go Bayesian.

As part of my learning journey, I decided to make publicly available a GitHub repository of Bayesian statistical analysis recipes in PyMC3 featuring models and data that I've seen elsewhere. Most of them I implemented from scratch, to get familiar with PyMC3 syntax and to get familiar with the logic of Bayesian statistical modelling.

Some models that are implemented here include:

- Binary and multinomial classification.
- Neural networks
- Linear regression
- Hierarchical modelling

The GitHub repository can be found here.

As part of my Insight project, I built Flu Forecaster, a project that aims to forecast influenza sequence evolution using deep learning. In it, I used a combination of variational autoencoders (VAEs) to translate time-stamped influenza protein sequences into a continuous coordinate space, and then used gaussian process regression to forecast future continuous coordinates that could be translated back to sequence space.

In order to achieve real-time surveillance, we need machine learning models with high learning capacity that are also highly interpretable. I am currently working on extending neural fingerprints to protein structures using convolutions on graph-structured data. In the process, we are writing a graph convolution implementation as a Python package, as well as a software package for converting protein 3-D structures into its corresponding "protein interaction graph" representation. While these tools are developed with the goal of deep learning in mind, we also anticipate their general use as well.

*Software:*

- Graph Fingerprint on GitHub
- Protein Interaction Network on GitHub
- Protein Convolutional Networks on GitHub

*Senior Collaborators:*

With its segmented genome, influenza viruses can reassort with other influenza viruses to produce hybrid progeny. Think of it as being like shuffling a red and a blue deck of cards in a box, and picking out each member of the suite at random.

As part of my thesis work, I developed an phylogenetic heuristic algorithm to identify reassortant influenza viruses. Using this method, my colleagues and I were able to show that reassortment is over-represented (relative to a null model) when crossing between viral hosts; additionally, the more evolutionarily distant two viral hosts were, the more over-represented reassortment was.

This may generalize across domains of life, where reticulate evolution enables organisms to more easily switch between ecological niches.

Zoonotic infections in humans originate in wild animals. To better understand their contact structure, I have been developing an open source hardware and software kit for monitoring wild animal behaviour using the Raspberry Pi, Python, and 3D printing. The time-lapse cameras, which we call TikiCams (they look like Hawaiian lamps when mounted), are based on off-the-shelf hardware available at computer and hardware stores.

*Videos:*

*Images:*

- Hanging cameras
- Laptop + Pi in shed
- Tikicams in the wild - image 1
- Tikicams in the wild - image 2
- Selfie with the Tikis

The GitHub repository can be found here.