written by Eric J. Ma on 2025-04-02 | tags: data science leadership workflows best practices software change management innovation ai open source
In this blog post, I share insights from my talk at BioIT World about leading one of Moderna's data science teams. I discuss our mission to make science run at the speed of thought and how we standardize workflows across the data scientists to enhance creativity. Key points include designing delivery models, making best practices easy, and balancing standards with innovation. I also touch on AI-assisted coding and our open-source infrastructure. Our approach aims to liberate scientists for their best work. Curious about how we achieve this balance and what it means for the future of data science at Moderna?
On 2 April 2025, I had the opportunity to speak at BioIT World about my experiences building data science teams and the associated tooling at Moderna. Unlike most speakers, I ditched the slides and went for a more interactive approach – that's just my style. Why subject everyone to 25 minutes of me blabbering when we could have an engaging discussion instead?
For those who couldn't attend, I wanted to share the key insights from my talk on how we've made our data science team fly at Moderna. These lessons come from my experience as a data science team lead, where I manage a team of six (including myself), serving Moderna's 600-person research organization, and where we share tools with other data science teams totaling 12, serving a wide remit across Moderna’s ~6000-person organization.
First, a bit about us: my home team's mission is to "make science run at the speed of thought and to quantify the unquantified." I joined Moderna in summer 2021 – too late to profit from the pandemic, so I'm really there for the science, not the money! Part of making science run at the speed of thought is to make sure we data scientists have the tools and practices at hand that enable us to work at the speed of thought. That is where standardization comes in.
In my talk, I focused on three key aspects of standardizing data science workflows:
I shared concrete examples from my experience at Moderna, but also encouraged the audience to discuss these topics with their neighbors so we could learn from the collective wisdom in the room. My goal was for everyone to leave with practical ideas they could implement right away—regardless of their team or org size. I will admit, the room was relatively quiet at first (I think most people came not expecting my talk delivery format), but gradually the room warmed up.
The first thing to address is the question: what exactly does your data science team deliver? And I don't mean "insights" – that's not a specific enough answer. I'm talking about concrete work products. Something tangible that people can interact with or mull over.
Before choosing delivery formats, it's essential to understand the "jobs to be done" (from Clayton Christensen's Innovator's Solution) – what are stakeholders actually trying to accomplish with your outputs? Delivery models should be tailored to what your company actually needs, not just what's trendy.
At Moderna, we deliver two things as a data science team:
We made this decision early on to avoid building things like dashboards, which in my opinion are where data science projects go to die. Dashboards should be built by the people who actually want to see the data. I've also been in the position of promising a UI and suffering the maintenance costs later. It was not a happy time. So we cut out building any kind of user interface from our work products, preferring to leave this to professional front-end engineers.
Yet at the same time, there is a strong "engineering" component to our work, because we want to reduce the friction that occurs from handover. I've heard stories in finance of data science and quant teams completing their prototype in a notebook within a month, but the ML engineering team needing 8 months to productionize it because the deployment language was different (Java, not Python), the runtime environment was different, and the engineering team was not intimately familiar with the problem domain, and so they were delayed on the appropriate suite of tests to write. So at Moderna, every data scientist must understand that our return on investment is only realized if we package our work as software, and we avoid these problems associated with handoff by ensuring that we have the tools to make deployment easy.
Moreover, I don't care if you know how to build a fancy Transformer ML model in a Jupyter notebook. If your work cannot be operationalized hands-free without your involvement, you have not delivered a return on investment on your time. Software is how we scale labour!
The benefits? Consistency, reusability, and very clear paths from exploration to production. Expectations with stakeholders are crystal clear, yet what we deliver is flexible enough to work with a variety of collaborators.
Consistent organization and automation reduces cognitive load and frees up mental energy for creative problem-solving. Our philosophy is simple: "Make the right thing the easy thing to do."
We've implemented several tactical best practices that make a big difference:
pyds-cli
) that, after a quick questionnaire, generate a full Python project package structure with:Software scales labor, documentation scales brains!
These standardized practices reduce cognitive overhead. If I jump into a colleague's project, I immediately know where to look for documentation and tests, which helps me onboard faster than having one-on-one sessions. Shared idioms and patterns make it easier to jump in and help each other, fostering stronger team dynamics.
I would note that the tech stack is merely one piece of the whole puzzle. The practices we build around the tech stack are another integral portion. The goal is for scientists to spend less time on configuration and more time on science. The investment of my time hopefully yields thirty, sixty, or a hundredfold in terms of what we give back to our teammates.
The best part? We enable all of this at extremely low software licensing spend — our only vendor spends on our technology stack for data scientists are on AWS, GitHub, and GitHub Copilot.
Change management principles apply to workflow standardization. Successful implementation requires both technical and cultural elements.
Our standardization journey at Moderna involved several key elements:
The command-line tooling we've developed isn't just my project – it's now shared with the ML platform team, and data scientists actively make code changes to it. We "dog food" our own tools, which is incredibly empowering. You've got to give power back to the frontliners so they can make the changes they need to move at the speed they want.
Sustainable change requires leadership support, frontline champions, visible benefits, and evolution at a pace teams can absorb.
The perceived tension between standardization and creativity is often a false dichotomy. Well-designed standards actually create freedom through constraints.
At Moderna, we've found this balance by being intentional about where we apply standards:
The key insight is focusing standardization on interfaces, not implementations. You're free to use whatever Python packages you need, but by standardizing on the language and the structure, we can make it easier to jump in and help each other.
In doing so, we help make data science teams fly by:
Though not a focus of my talk, many asked about AI-assisted coding. This is completely ingrained at Moderna – everyone has access to ChatGPT Enterprise (that's public knowledge), and every developer who asks for GitHub Copilot gets it. We're developing extensive documentation on how to productively use AI assistance in daily coding.
The testimonials are powerful. Just yesterday, I interviewed a DevOps engineer who's doing three to four people's worth of work and coasting because of GitHub Copilot in agent mode. It's the ultimate productivity tool if you know how to wield it properly.
What's particularly powerful about our approach is that it can be implemented with minimal vendor dependencies. At Moderna, our data science and deployment infrastructure is built almost entirely on open-source tools — our only significant software expenses are AWS, GitHub Enterprise, and GitHub Copilot. This deliberate choice to avoid vendor lock-in provides significant cost savings and the flexibility to adapt quickly as the scientific landscape changes.
The ultimate goal isn't standardization for its own sake — it's creating an environment where data scientists can do their best thinking and most innovative work. When done right, good standards don't constrain scientists; they liberate them.
It's taken three and a half years of my life at Moderna to build this culture, but it's been worth it. I'm optimistic about our future regardless of the stock price – I'm long on the technology, and concomitantly, I would love to see science run at the speed of thought!
Thanks for coming to my TED Talk! :)
@article{
ericmjl-2025-how-creativity,
author = {Eric J. Ma},
title = {How to standardize Data Science ways of working to unlock your team's creativity},
year = {2025},
month = {04},
day = {02},
howpublished = {\url{https://ericmjl.github.io}},
journal = {Eric J. Ma's Blog},
url = {https://ericmjl.github.io/blog/2025/4/2/how-to-standardize-data-science-ways-of-working-to-unlock-your-teams-creativity},
}
I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.
If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!
Finally, I do free 30-minute GenAI strategy calls for teams that are looking to leverage GenAI for maximum impact. Consider booking a call on Calendly if you're interested!