Languages and Tools
Programming languages and tools that I use in my day-to-day.

I have been programming in Python since 2013, building tools based on machine learning models, developing tutorials and reports in Jupyter notebooks, and developing traditional software tools for high performance computing and data visualization. My HTML and CSS skills are honed on my personal website and resume.


Packages
Specific Python packages that I have proficiency with.

I am a core developer of pyjanitor, PyMC and NetworkX. I have also made open source contributions to Matplotlib and Bokeh. My tutorial collection covers JAX & NumPy, PyMC, and NetworkX for analytics and pytest/pandera for software and data testing.


ML/Stats
Concepts and ideas that I am comfortable addressing questions on.

I have experience developing bespoke Bayesian models, including Bayesian neural networks in PyMC and Gaussian process models with neural tangents by Google. In addition, I have designed end-to-end efficient graph neural network data structures and models trained with JAX, and have architected and built machine learning systems on top of PyTorch. I am never content to rely on others' models without peering into the black box, and strive to have full mastery over every model I wield.


Life Sciences & Chemistry
Domain expertise gained during my education and prior experience.

Leveraging my prior experience as a benchside synthetic biologist, I have contributed principled design of experiments on multifactorial high throughput measurements leveraging Bayesian model uncertainties. Additionally, I have built massively parallelized QSAR and protein property prediction models. Finally, I have taken classes on immunology, molecular biology, virology and biochemistry, allowing me to quickly contribute productively on a wide variety of life sciences projects.





Principal Data Scientist (Research)
Moderna Therapeutics
July 2021-Present, Cambridge MA
Accomplishments and Responsibilities:

  • Data Science lead in Moderna's DSAI group supporting Moderna's Research efforts, covering problems in mRNA design, protein engineering, and lipid nanoparticle chemistry.
  • Our team of six supports projects in AI library design, computer vision, and custom software and algorithm development that saved millions of dollars in annual FTE costs through automating manual research workflows and enriching the data obtained from existing data.
  • Architedted and implementated standardized data science workflows within the team
  • Co-organized quarterly docathons to foster the habit of writing project documentation.
  • Core maintainer of Moderna's first open source project, SeqLike.


Senior Expert II/Investigator III, Data Science and Statistical Learning
Novartis Institutes for Biomedical Research (NIBR)
August 2020-June 2021, Cambridge MA
Accomplishments and Responsibilities:

  • Investigator in the Informatics Products and Data Sciences (IPDS) team.
  • Spearheaded internal education initiatives to train colleagues on machine learning, statistical methods, and software deployment.
  • Built a generalized autoregressive hidden Markov model package to automatically learn latent motion states of mouse behaviour, with Zach Barry and our intern Nitin Kumar Mittal.
  • Built graph neural network model to predict protein liabilities with Mei Xiao and Kannan Sankar.
  • Built Suzuki chemical reaction recommendation and optimization engine with Cihan Soylu, Mike Fortunato, and John Lopez leveraging internal and external data.
  • Mentored one Masters thesis student Arkadij Kummer on probabilistic models for antibody engineering.


Investigator II/Senior Expert I, Data Science and Statistical Learning
Novartis Institutes for Biomedical Research (NIBR)
September 2018-July 2020, Cambridge MA
Accomplishments and Responsibilities:

  • Investigator in the Scientific Data Analysis (SDA) team.
  • Co-organized and developed teaching material for machine learning & deep learning workshops and seminars internally at NIBR with colleagues Sivakumar Gowrisankar, Yuan Wang, Laszlo Urban, and Sean Xiao.
  • Developed a massively parallelized engine for training, evaluating, and serving machine learning models on internal assay data, with an emphasis on serving prediction uncertainties with NIBR colleagues Nikolaus Stiefl and Gregori Gerebtzoff.
  • Developed machine learning workflows to accelerate protein engineering as part of the internal Genesis Labs innovation initiative, with Richard Lewis and our intern Arkadij Kummer.
  • Accelerated recurrent neural network UniRep using JAX. Software and preprint available.


Investigator I, Data Science & Statistical Learning
Novartis Institutes for Biomedical Research (NIBR)
September 2017-August 2018, Cambridge MA
Accomplishments and Responsibilities:

  • Investigator in the Scientific Data Analysis (SDA) team.
  • Performed internal consulting projects and expanded the SDA Statistical Learning initiative with colleagues.
  • Developed parameterized Bayesian agent-based models of internal project portfolio for scenario planning purposes, using PyMC and Mesa.
  • Assisted in the analysis of high throughput DROSHA cleavage data.
  • Assisted in mentoring two interns, Stacy Meichle (Computer Aided Drug Discovery with Clayton Springer) and Fritz Lekschas (SDA with Brant Peterson).


Health Data Science Fellow
Insight Data Science
June 2017-August 2017, Boston MA
Accomplishments and Responsibilities:

  • Built Flu Forecaster, a machine learning-powered system that forecasts flu sequences six months out, to better prepare for manufacturing of vaccine strains.
  • Implemented a variational autoencoder (deep learning model) to learn a continuous representation of 14,455 influenza hemagglutinin protein sequences, and trained a Gaussian process model on the continuous representation to predict new flu sequences.
  • Developed interactive blog post using Flask and Bootstrap, and deployed to Heroku and GitHub.
  • Led peer workshops on web development, deep learning and code style.


ScD Candidate
Massachusetts Institute of Technology
August 2011-May 2017, Cambridge MA
Accomplishments and Responsibilities:

  • Developed a scalable, network-based phylogenetic heuristic algorithm for detecting reassortant influenza viruses using 18,632 fully sequenced virus genomes that improved our capacity to detect reassortment events by two orders of magnitude.
  • The algorithm was used in a lead-author study (published in PNAS) providing systematic evidence that genome shuffling is important for host switching, and a co-authored study (published in Ecology Letters) that showed that reassortment is a strategy for viral gene persistence in wild animal reservoirs.
  • Contributed reproducible data analysis for colleagues through fluorescent image quantification and genomic analysis in studies that refuted binding properties of novel influenza viruses.
  • Performed Bayesian statistical modelling for colleagues testing the efficacy of phone sterilization tools.
  • Delivered tutorials and talks on Network Analysis and Bayesian statistical methods at annual Python conferences, including PyCon, SciPy, and PyData.





Massachusetts Institute of Technology
Department of Biological Engineering
August 2011-May 2017, Cambridge MA
Doctor of Science (Sc.D.)

I did my doctoral training under Prof. Jonathan Runstadler in computational influenza evolution and ecology.


The University of British Columbia
Integrated Sciences
June 2006-May 2010, Vancouver BC
Bachelor of Science (B.Sc.)

I conducted gut immunology and pathology research in the laboratory of Prof. Brett Finlay under the supervision of Dr. Guntram Grassl and Dr. Hongbing Yu. I also founded the iGEM team at UBC under Prof. Eric Lagally.





Data Science
Manuscripts pertaining to applied data science, and visualization.

Statistics
Papers where I contributed statistical analyses or developed statistical methods.

Influenza
Papers pertaining to influenza surveillance, evolution and ecology.