Eric J Ma's Website

« 1 2 3 »

Why you should take part in the SciPy sprints!

written by Eric J. Ma on 2025-03-17 | tags: open source python sprints matplotlib git networking career community skills development

In this blog post, I share my transformative experience participating in the SciPy sprints, where I made my first open source contribution to Matplotlib. Through this journey, I gained confidence in git, improved my software skills, and learned the importance of effective communication with maintainers. Sprints offer invaluable networking opportunities and skill development, especially for students and data scientists. By contributing, you can make a lasting impact on projects you care about. Are you ready to join a sprint and potentially change the trajectory of your career? Read on to find out more!

Read on... (570 words, approximately 3 minutes reading time)
The art of finesse as a data scientist

written by Eric J. Ma on 2025-03-16 | tags: data science finesse productivity career communication leadership strategy problem solving technical skill professional development stakeholder management project management collaboration adaptability work effectiveness

Finesse in data science is the subtle skill that distinguishes exceptional practitioners from the merely competent. It involves recognizing when you're stuck in technical rabbit holes, creating tangible progress markers for stakeholders, working backwards from meaningful milestones, adapting with purpose when approaches aren't working, creatively overcoming technical roadblocks, and cultivating a network for timely assistance. These skills help data scientists navigate complex challenges while consistently delivering value, balancing persistence with adaptability, technical depth with clear communication, and planning with flexibility. How might developing your finesse transform your effectiveness as a data scientist?

Read on... (1231 words, approximately 7 minutes reading time)
A blueprint for data-driven molecule engineering

written by Eric J. Ma on 2025-03-06 | tags: data science biotech molecule discovery experiment design machine learning protein engineering

In this blog post, I explore how cross-functional teams in biotech can accelerate molecule discovery using a strategic playbook. Through the story of a fictitious biotech, Catalyst Therapeutics, I highlight the importance of robust experimental design, integrating data science with human intuition, and balancing computational methods with practical insights. The team's journey reveals how better experiments lead to better models and ultimately, better molecules. Are you ready to discover how these principles can transform your biotech projects?

Read on... (3149 words, approximately 16 minutes reading time)
How to fix PyPI upload errors related to license metadata

written by Eric J. Ma on 2025-03-01 | tags: python packaging pypi hatchling setuptools build-backend metadata license github actions workflow automation deployment error handling

Encountering a PyPI upload error related to license metadata? The solution is straightforward - switch from setuptools to Hatchling as your build backend. In this post, I walk through how to fix the "license-file introduced in metadata version 2.4" error by updating your pyproject.toml configuration. Along the way, I learned some new things, including the fact that modern build backends like Hatchling provide better support for PEP 621 metadata features compared to older tools like setuptools.

Read on... (298 words, approximately 2 minutes reading time)
Reliable biological data requires physical quantities, not statistical artifacts

written by Eric J. Ma on 2025-02-23 | tags: machine learning biology measurement data science bayesian statistics metrology reproducibility biophysics protein design protein engineering uncertainty experimental design

When building machine learning models in biology, we often encounter data that's been heavily processed with statistical transformations like p-values and normalizations. This essay argues that this practice fundamentally undermines our ability to build reliable models and maintain interpretable datasets. Through a real-world example of protein binding experiments, it demonstrates why collecting physical quantities (like binding strength in nM) with proper replicates is vastly superior to statistical artifacts, and how Bayesian estimation can help us properly handle experimental variation while maintaining physical units. Are you tired of wrestling with hard-to-interpret biological data and ready to build more reliable experimental pipelines?

Read on... (2450 words, approximately 13 minutes reading time)
Let me ship you the Python you need

written by Eric J. Ma on 2025-02-17 | tags: packaging uv tools marimo juv environments

In this blog post, I explore how modern Python tooling is flipping the script on the age-old "which Python should I use?" question. Through my experience with uvx, marimo, and juv, I show how we're moving away from the traditional headache of environment setup and toward a world where tools automatically ship you the exact Python you need. No more environment setup puzzles – just specify your Python version and get straight to work. It's a liberating shift that's changing how I approach one-off Python work, and I think it's pretty exciting!

Read on... (261 words, approximately 2 minutes reading time)
Lightening the LlamaBot

written by Eric J. Ma on 2025-02-07 | tags: refactoring llamabot optimization docker python cli packages performance engineering

In this blog post, I share my journey of tackling dependency bloat in LlamaBot. What began as a simple LLM bot framework had grown into a monolithic system with an extensive dependency chain, leading to massive installation sizes. By mapping dependencies, refactoring the code, and organizing optional dependencies, I managed to reduce the container size significantly. This exercise taught me the importance of regular codebase maintenance and focusing on core functionalities. Now, LlamaBot is leaner and more efficient. Curious about the strategies I used to achieve this transformation?

Read on... (1510 words, approximately 8 minutes reading time)
PyData Boston/Cambridge Talk @ Moderna: What makes an agent?

written by Eric J. Ma on 2025-01-31 | tags: large language models python llamabot pydantic structuredbot agentbot talks meetups

In this blog post, I explore the concept of 'what makes an agent' by discussing various implementations of LlamaBot, a Python package for LLM exploration. I dissect the differences between SimpleBots, StructuredBots, and AgentBots, highlighting their capabilities and limitations in terms of agency and decision-making. Through audience discussions and examples, I aimed to provoke thought on the definition and design of agents, and together, we had an engaging discussion. Can we truly define an agent, or is it like the Turing Test, a concept that evolves with our understanding and technological advancements?

Read on... (3331 words, approximately 17 minutes reading time)
Why data from preclinical biotech lab experiments make machine learning challenging

written by Eric J. Ma on 2025-01-19 | tags: biotech datasets machine learning research data fusion decision support systems data science

In this blog post, I explore the challenges biotech teams face when integrating public datasets with internal data for machine learning. Despite initial excitement, issues like data compatibility, missing variables, domain shifts, and biological complexity often arise. I suggest a shift from a machine learning perspective to a decision support approach, advocating for separate models and a decision fusion layer that incorporates human expertise. This method respects the complexity of biological systems and aids in effective decision-making. How can we better navigate these challenges to accelerate biotech discoveries?

Read on... (1373 words, approximately 7 minutes reading time)
Writing at the speed of thought

written by Eric J. Ma on 2025-01-13 | tags: dictation accessibility productivity artificial intelligence writing workflow voicepal creativity

When typing became physically demanding, I discovered that dictation tools could do more than just help me write – they could fundamentally change how I capture and develop ideas. Using Better Dictation and VoicePal, combined with AI assistance, I found a way to write that matches the natural flow of thought. This isn't just about accessibility or working around limitations; it's about finding a better way to translate the nonlinear, rapid-fire nature of our thoughts into written words. I share my approach to preserving authentic voice while using AI tools, and why sometimes constraints push us toward unexpected improvements in how we work.

Read on... (801 words, approximately 5 minutes reading time)
« 1 2 3 »