Eric J Ma's Website

« 1 2 3 »

Good practices for AI-assisted development from a live protein calculator demo

written by Eric J. Ma on 2025-04-19 | tags: standardization ai coding protein mass tools project collaboration development

In this blog post, I share my experience from a live coding demo at BioIT World 2025, where I built a protein mass spectrometry calculator tool. I emphasize the importance of standardization in data science, starting with a design document, and using AI assistance for rapid development. Despite a live demo hiccup, I showcased the tool's capabilities and highlighted key lessons in AI collaboration, such as the value of context and interactive communication. How can AI tools enhance your development process while maintaining human oversight and creativity?

Read on... (1088 words, approximately 6 minutes reading time)
Wow, Marimo!

written by Eric J. Ma on 2025-04-08 | tags: marimo reactive notebooks uv deployment serverless data science modal

In this blog post, I share my experience with Marimo notebooks, highlighting their fully reactive nature and self-contained environments. I discuss how to run Marimo without installation using uv, and the benefits of AI-assisted coding. I also cover exporting notebooks to Markdown and deploying them as Modal apps. While Marimo's keybindings differ from Jupyter, its reactive execution and UI builder offer unique advantages. Curious about how Marimo can transform your coding workflow?

Read on... (1707 words, approximately 9 minutes reading time)
From data chaos to statistical clarity: A laboratory transformation story

written by Eric J. Ma on 2025-04-05 | tags: automation bayesian estimation robust estimation r2d2 experiment design

A high-throughput screening lab transforms their data analysis workflow by applying statistical thinking from the start. By combining robust estimation with R2D2 priors, they eliminate tedious manual data cleaning, automatically handle outliers, decompose sources of variation, and objectively measure both statistical estimation model and laboratory quality performance. This story demonstrates how thoughtful experimental design paired with principled statistical methods can dramatically improve both efficiency and scientific quality. How might statistical thinking transform your experimental workflow?

Read on... (1823 words, approximately 10 minutes reading time)
Bayesian Superiority Estimation with R2D2 Priors: A Practical Guide for Protein Screening

written by Eric J. Ma on 2025-04-03 | tags: bayesian r2d2 variance modelling fluorescence experimental design probability of superiority probabilistic modelling data science statistics

In this blog post, I explore how to tackle experimental noise and candidate ranking in protein screening using Bayesian methods. By employing R2D2 priors, we can decompose variance into interpretable components, helping us understand the true biological signal versus experimental artifacts. Additionally, Bayesian superiority calculation allows us to quantify the probability that one protein outperforms another, providing a more robust comparison than traditional methods. These techniques are not only applicable to protein screening but also to drug discovery, materials science, and more. Are you ready to enhance your experimental insights with Bayesian logic?

Read on... (4057 words, approximately 21 minutes reading time)
How to standardize Data Science ways of working to unlock your team's creativity

written by Eric J. Ma on 2025-04-02 | tags: data science leadership workflows best practices software change management innovation ai open source

In this blog post, I share insights from my talk at BioIT World about leading one of Moderna's data science teams. I discuss our mission to make science run at the speed of thought and how we standardize workflows across the data scientists to enhance creativity. Key points include designing delivery models, making best practices easy, and balancing standards with innovation. I also touch on AI-assisted coding and our open-source infrastructure. Our approach aims to liberate scientists for their best work. Curious about how we achieve this balance and what it means for the future of data science at Moderna?

Read on... (1943 words, approximately 10 minutes reading time)
Why you should take part in the SciPy sprints!

written by Eric J. Ma on 2025-03-17 | tags: open source python sprints matplotlib git networking career community skills development

In this blog post, I share my transformative experience participating in the SciPy sprints, where I made my first open source contribution to Matplotlib. Through this journey, I gained confidence in git, improved my software skills, and learned the importance of effective communication with maintainers. Sprints offer invaluable networking opportunities and skill development, especially for students and data scientists. By contributing, you can make a lasting impact on projects you care about. Are you ready to join a sprint and potentially change the trajectory of your career? Read on to find out more!

Read on... (570 words, approximately 3 minutes reading time)
The art of finesse as a data scientist

written by Eric J. Ma on 2025-03-16 | tags: data science finesse productivity career communication leadership strategy problem solving technical skill professional development stakeholder management project management collaboration adaptability work effectiveness

Finesse in data science is the subtle skill that distinguishes exceptional practitioners from the merely competent. It involves recognizing when you're stuck in technical rabbit holes, creating tangible progress markers for stakeholders, working backwards from meaningful milestones, adapting with purpose when approaches aren't working, creatively overcoming technical roadblocks, and cultivating a network for timely assistance. These skills help data scientists navigate complex challenges while consistently delivering value, balancing persistence with adaptability, technical depth with clear communication, and planning with flexibility. How might developing your finesse transform your effectiveness as a data scientist?

Read on... (1231 words, approximately 7 minutes reading time)
A blueprint for data-driven molecule engineering

written by Eric J. Ma on 2025-03-06 | tags: data science biotech molecule discovery experiment design machine learning protein engineering

In this blog post, I explore how cross-functional teams in biotech can accelerate molecule discovery using a strategic playbook. Through the story of a fictitious biotech, Catalyst Therapeutics, I highlight the importance of robust experimental design, integrating data science with human intuition, and balancing computational methods with practical insights. The team's journey reveals how better experiments lead to better models and ultimately, better molecules. Are you ready to discover how these principles can transform your biotech projects?

Read on... (3149 words, approximately 16 minutes reading time)
How to fix PyPI upload errors related to license metadata

written by Eric J. Ma on 2025-03-01 | tags: python packaging pypi hatchling setuptools build-backend metadata license github actions workflow automation deployment error handling

Encountering a PyPI upload error related to license metadata? The solution is straightforward - switch from setuptools to Hatchling as your build backend. In this post, I walk through how to fix the "license-file introduced in metadata version 2.4" error by updating your pyproject.toml configuration. Along the way, I learned some new things, including the fact that modern build backends like Hatchling provide better support for PEP 621 metadata features compared to older tools like setuptools.

Read on... (298 words, approximately 2 minutes reading time)
Reliable biological data requires physical quantities, not statistical artifacts

written by Eric J. Ma on 2025-02-23 | tags: machine learning biology measurement data science bayesian statistics metrology reproducibility biophysics protein design protein engineering uncertainty experimental design

When building machine learning models in biology, we often encounter data that's been heavily processed with statistical transformations like p-values and normalizations. This essay argues that this practice fundamentally undermines our ability to build reliable models and maintain interpretable datasets. Through a real-world example of protein binding experiments, it demonstrates why collecting physical quantities (like binding strength in nM) with proper replicates is vastly superior to statistical artifacts, and how Bayesian estimation can help us properly handle experimental variation while maintaining physical units. Are you tired of wrestling with hard-to-interpret biological data and ready to build more reliable experimental pipelines?

Read on... (2450 words, approximately 13 minutes reading time)
« 1 2 3 »