Jump to: Skills | Experience | Education | Publications
Languages and Tools
Programming languages and tools that I use in my day-to-day.I have been programming in Python since 2013, building tools based on machine learning models, developing tutorials and reports in Jupyter notebooks, and developing traditional software tools for high performance computing and data visualization. My HTML and CSS skills are honed on my personal website and resume.
Packages
Specific Python packages that I have proficiency with.I am a core developer of pyjanitor, PyMC and NetworkX. I have also made open source contributions to Matplotlib and Bokeh. My tutorial collection covers JAX & NumPy, PyMC, and NetworkX for analytics and pytest/pandera for software and data testing.
ML/Stats
Concepts and ideas that I am comfortable addressing questions on.I have experience developing bespoke Bayesian models, including Bayesian neural networks in PyMC and Gaussian process models with neural tangents by Google. In addition, I have designed end-to-end efficient graph neural network data structures and models trained with JAX, and have architected and built machine learning systems on top of PyTorch. I am never content to rely on others' models without peering into the black box, and strive to have full mastery over every model I wield.
Life Sciences & Chemistry
Domain expertise gained during my education and prior experience.Leveraging my prior experience as a benchside synthetic biologist, I have contributed principled design of experiments on multifactorial high throughput measurements leveraging Bayesian model uncertainties. Additionally, I have built massively parallelized QSAR and protein property prediction models. Finally, I have taken classes on immunology, molecular biology, virology and biochemistry, allowing me to quickly contribute productively on a wide variety of life sciences projects.
Senior Principal Data Scientist (Research)
Moderna Inc.April 2024-Present, Cambridge MA
Accomplishments and Responsibilities:
- Data Science lead in Moderna's Digital for Research organization supporting Moderna's Research efforts, covering problems in mRNA design, protein engineering, and lipid nanoparticle chemistry.
- Our team accelerates science to the speed of thought and quantifies the previously unquantified.
- Our team of six builds API-callable, artificial intelligence-powered software products for use AI library design, computer vision, and custom software and algorithm development.
- Co-organizer of work-in-progress seminar series to encourage Pixar-like early stage discussion of work and cross-functional collaboration.
- Primary maintainer of internal deployment of AlphaFold.
Principal Data Scientist (Research)
Moderna Inc.July 2021-March 2024, Cambridge MA
Accomplishments and Responsibilities:
- Data Science lead in Moderna's DSAI group supporting Moderna's Research efforts, covering problems in mRNA design, protein engineering, and lipid nanoparticle chemistry.
- Our team of six supports projects in AI library design, computer vision, and custom software and algorithm development that saved millions of dollars in annual FTE costs through automating manual research workflows and enriching the data obtained from existing data.
- Architected and implementated standardized data science workflows for the broader team.
- Co-organized quarterly docathons to foster the habit of writing project documentation.
- Core maintainer of Moderna's first open source project, SeqLike.
Senior Expert II/Investigator III, Data Science and Statistical Learning
Novartis Institutes for Biomedical Research (NIBR)August 2020-June 2021, Cambridge MA
Accomplishments and Responsibilities:
- Investigator in the Informatics Products and Data Sciences (IPDS) team.
- Spearheaded internal education initiatives to train colleagues on machine learning, statistical methods, and software deployment.
- Built a generalized autoregressive hidden Markov model package to automatically learn latent motion states of mouse behaviour, with Zach Barry and our intern Nitin Kumar Mittal.
- Built graph neural network model to predict protein liabilities with Mei Xiao and Kannan Sankar.
- Built Suzuki chemical reaction recommendation and optimization engine with Cihan Soylu, Mike Fortunato, and John Lopez leveraging internal and external data.
- Mentored one Masters thesis student Arkadij Kummer on probabilistic models for antibody engineering.
Investigator II/Senior Expert I, Data Science and Statistical Learning
Novartis Institutes for Biomedical Research (NIBR)September 2018-July 2020, Cambridge MA
Accomplishments and Responsibilities:
- Investigator in the Scientific Data Analysis (SDA) team.
- Co-organized and developed teaching material for machine learning & deep learning workshops and seminars internally at NIBR with colleagues Sivakumar Gowrisankar, Yuan Wang, Laszlo Urban, and Sean Xiao.
- Developed a massively parallelized engine for training, evaluating, and serving machine learning models on internal assay data, with an emphasis on serving prediction uncertainties with NIBR colleagues Nikolaus Stiefl and Gregori Gerebtzoff.
- Developed machine learning workflows to accelerate protein engineering as part of the internal Genesis Labs innovation initiative, with Richard Lewis and our intern Arkadij Kummer.
- Accelerated recurrent neural network UniRep using JAX. Software and preprint available.
Investigator I, Data Science & Statistical Learning
Novartis Institutes for Biomedical Research (NIBR)September 2017-August 2018, Cambridge MA
Accomplishments and Responsibilities:
- Investigator in the Scientific Data Analysis (SDA) team.
- Performed internal consulting projects and expanded the SDA Statistical Learning initiative with colleagues.
- Developed parameterized Bayesian agent-based models of internal project portfolio for scenario planning purposes, using PyMC and Mesa.
- Assisted in the analysis of high throughput DROSHA cleavage data.
- Assisted in mentoring two interns, Stacy Meichle (Computer Aided Drug Discovery with Clayton Springer) and Fritz Lekschas (SDA with Brant Peterson).
Health Data Science Fellow
Insight Data ScienceJune 2017-August 2017, Boston MA
Accomplishments and Responsibilities:
- Built Flu Forecaster, a machine learning-powered system that forecasts flu sequences six months out, to better prepare for manufacturing of vaccine strains.
- Implemented a variational autoencoder (deep learning model) to learn a continuous representation of 14,455 influenza hemagglutinin protein sequences, and trained a Gaussian process model on the continuous representation to predict new flu sequences.
- Developed interactive blog post using Flask and Bootstrap, and deployed to Heroku and GitHub.
- Led peer workshops on web development, deep learning and code style.
ScD Candidate
Massachusetts Institute of TechnologyAugust 2011-May 2017, Cambridge MA
Accomplishments and Responsibilities:
- Developed a scalable, network-based phylogenetic heuristic algorithm for detecting reassortant influenza viruses using 18,632 fully sequenced virus genomes that improved our capacity to detect reassortment events by two orders of magnitude.
- The algorithm was used in a lead-author study (published in PNAS) providing systematic evidence that genome shuffling is important for host switching, and a co-authored study (published in Ecology Letters) that showed that reassortment is a strategy for viral gene persistence in wild animal reservoirs.
- Contributed reproducible data analysis for colleagues through fluorescent image quantification and genomic analysis in studies that refuted binding properties of novel influenza viruses.
- Performed Bayesian statistical modelling for colleagues testing the efficacy of phone sterilization tools.
- Delivered tutorials and talks on Network Analysis and Bayesian statistical methods at annual Python conferences, including PyCon, SciPy, and PyData.
Massachusetts Institute of Technology
Department of Biological EngineeringAugust 2011-May 2017, Cambridge MA
Doctor of Science (Sc.D.)
I did my doctoral training under Prof. Jonathan Runstadler in computational influenza evolution and ecology.
The University of British Columbia
Integrated SciencesJune 2006-May 2010, Vancouver BC
Bachelor of Science (B.Sc.)
I conducted gut immunology and pathology research in the laboratory of Prof. Brett Finlay under the supervision of Dr. Guntram Grassl and Dr. Hongbing Yu. I also founded the iGEM team at UBC under Prof. Eric Lagally.
Data Science
Manuscripts pertaining to applied data science, and visualization.- mtx-COBRA: Subcellular localization prediction for bacterial proteins, 2024. Computers in Biology and Medicine
- Forecasting SARS-CoV-2 spike protein evolution from small data by deep learning and regression, 2024. Frontiers in Systems Biology
- Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Biomolecular Structures and Interaction Networks, 2022. ICML AI4Science
- Machine-Directed Evolution of an Imine Reductase for Activity and Stereoselectivity, 2021. ACS Catalysis
- JAEGER – Hunting for Antimalarials with Generative Chemistry, 2021. ChemRxiv
- Functional Atlas of Primary miRNA Maturation by the Microprocessor, 2020. Molecular Cell
- Reimplementing UniRep in JAX, 2020. BioRXiv
- pyjanitor: A Cleaner API for Cleaning Data, 2019. Proceedings of the 18th Python in Science Conference (SciPy 2019)
- Peax: Interactive Visual Pattern Search in Sequential Data Using Unsupervised Deep Representation Learning, 2019. Computer Graphics Forum
Statistics
Papers where I contributed statistical analyses or developed statistical methods.- Principled decision-making workflow with hierarchical Bayesian models of high throughput dose-response measurements, 2021. Entropy
- Long-term colonization dynamics of Enterococcus faecalis in implanted devices in research macaques, 2018. Applied and Environmental Microbiology
- Evaluation of 6 Methods for Aerobic Bacterial Sanitization of Smartphones, 2018. Journal of the American Association for Laboratory Animal Science
Influenza
Papers pertaining to influenza surveillance, evolution and ecology.- Reassortment of influenza A viruses in wild birds in Alaska before H5 clade 2.3. 4.4 outbreaks, 2017. Emerging Infectious Diseases
- A real-time surveillance dashboard for monitoring viral phenotype from sequence, 2016. International Journal of Infectious Diseases
- Evidence of seasonality in a host-pathogen system: Influenza across the annual cycle of wild birds, 2016. Integrative and Comparative Biology
- A point mutation in the polymerase protein PB2 allows a reassortant H9N2 influenza isolate of wild-bird origin to replicate in human cells, 2016. Infection, Genetics and Evolution
- New England harbor seal H3N8 influenza virus retains avian-like receptor specificity, 2016. Scientific Reports
- Ecosystem interactions underlie the spread of avian influenza a viruses with pandemic potential, 2016. PLOS Pathogens
- Transmission of influenza reflects seasonality of wild birds across the annual cycle, 2016. Ecology Letters
- Reticulate evolution is favored in influenza niche switching, 2016. Proceedings of the National Academies of Science
- Genetic characterization of a rare H12N3 avian influenza virus isolated from a green-winged teal in Japan, 2015. Virus Genes