written by Eric J. Ma on 2025-08-24 | tags: biotech communication decisions statistics translation collaboration trust meetings probability stakeholders
In this blog post, I share practical strategies for data scientists and statisticians to communicate more effectively with lab scientists in biotech. Instead of overwhelming collaborators with methods, I explain how to focus on decision-making, translate complex analyses into actionable probabilities, and build trust through clarity. I also offer tips for structuring meetings and anticipating common questions. Want to know how to make your insights drive real decisions in the lab?
Imagine this scenario: A data scientist explains a hierarchical Bayesian model for 45 minutes. Beautiful math. Elegant handling of batch effects. The lab scientists are polite but glazed over. Finally, someone interrupts: "Sorry, but should we move this compound forward or not?"
The data scientist hadn't even calculated that probability.
Sound familiar?
If you're a statistician or data scientist in biotech, you've probably been there. You've spent hours on sophisticated analyses, crafted beautiful slides about your methods, and watched your audience's eyes glaze over while you explained mixed-effects models.
Meanwhile, they just needed to know if they should spend $200K on the next experiment.
Here's the thing: Lab scientists aren't struggling to understand your statistics because they're not smart enough. They're brilliant experts who've spent years mastering protein folding, cell signaling, or synthetic chemistry. They're just juggling their own complex problems and need you to translate your analysis into something they can act on.
Today, I'm going to show you exactly how to do that.
Every interaction has a finite "communication budget" — limited attention, time, and cognitive load. Most data scientists spend this budget like tourists with foreign currency, not realizing the exchange rate.
Think about your last presentation. Where did you spend your time?
🚫 The typical (failed) allocation:
I get it. We're trained to show our work. We think rigor equals value. We assume that if we explain our methods thoroughly enough, scientists will understand what to do.
But here's what actually works:
✅ The allocation that drives decisions:
But context matters. A curious scientist with time might genuinely want 30% methods—they're building mental models for future decisions. Someone facing a go/no-go decision tomorrow? They need 70% decision implications, minimal methods.
The key is adopting BLUF (Bottom-Line Up-Front). Structure your presentation by working backwards from the decision to be made.
Try this: Start with the decision and recommendation, then work backwards to the evidence that supports it. Lead with "Based on our analysis, I recommend we proceed with Compound A because there's an 87% probability it meets our potency threshold."
This tells them immediately what they need to know, then you can spend the remaining time explaining why.
Here's what happens when you don't use BLUF: A data scientist spent an entire program review meeting walking through their elegant approach to handling missing data. Really sophisticated stuff. Multiple imputation with careful consideration of the missing-at-random assumption.
Twenty minutes in, the program lead interrupted: "This is interesting, but we need to decide today whether to advance this molecule. Does it meet our potency threshold or not?"
They hadn't even calculated that probability. They'd spent their entire communication budget on something that wasn't even the program lead's concern that day.
With BLUF, they would have started: "Based on our analysis, I recommend we advance this molecule. There's an 82% probability it meets our potency threshold, even accounting for the missing data. Here's how I handled the missing data to arrive at this conclusion..."
Lab scientists operate in three distinct modes, and each requires a completely different communication approach.
They're under time pressure for go/no-go decisions. Maybe it's a pipeline review tomorrow. Maybe they need to order materials today. Maybe the synthesis team is literally waiting for their answer.
Signs you'll hear:
What they need: Probability of success and a clear recommendation. That's it.
They're genuinely curious about your methods. Maybe they're trying to understand why this analysis differs from last time. Maybe they're building intuition for future experiments.
Signs you'll hear:
What they need: Mental models and intuition, not mathematical formulas.
They're not really interested in learning—they're assessing whether they can rely on your judgment for million-dollar decisions.
Signs you'll hear:
What they need: Confidence that you've been rigorous without the full mathematical proof.
Here's the mistake most of us make:
🚫 Wrong approach: Launch into methods explanation regardless of mode
✅ Right approach: Start with the decision and recommendation, then adapt the explanation depth based on their mode
Most data scientists default to teaching mode when scientists are in decision mode. That's like giving someone a recipe when they just asked if dinner's ready.
Consider this scenario: A scientist approaches a data scientist with dose-response data. The data scientist starts explaining their Bayesian approach to EC50 estimation. Five minutes in, the scientist stops them: "I just need to know if this is more potent than our current lead."
She was in Decision Mode. The data scientist was in Teaching Mode. Complete mismatch.
The better approach is to be deliberate rather than reactive. Before any meeting, clarify the goals upfront. Ask what they're trying to decide. Talk to stakeholders beforehand to understand the context. Do the pre-work rather than trying to read body language in real-time.
You think in distributions. They think in decisions. This gap is why brilliant analyses often fail to drive action.
Here's the framework that works for bridging that gap:
Layer 1: Statistical Reality (Keep this in your head)
This layer is for you. It ensures your analysis is rigorous. But it stays in your head or the appendix.
Layer 2: Scientific Meaning (The bridge)
Here's the key: Keep the full distribution at this layer. Don't collapse to point estimates yet. You're translating statistics to science, but you're not making decisions yet.
Layer 3: Decision Layer (What they actually need)
The magic is waiting until the last possible moment to collapse distributions into decision probabilities. Why? Because different decisions need different integrations of the same posterior.
Let me show you what I mean:
🚫 Wrong (Layer 1 bleeding into communication): "The posterior distribution for the treatment effect has a 95% credible interval of [0.15, 0.31] with a mean of 0.23."
What does a lab scientist do with this? Nothing. It's statistical reality without translation.
✅ Right (Layer 3, decision-focused): "There's a 92% probability your treatment exceeds the TPP threshold. If you need 95% confidence for the program milestone, run 20 more samples. If 90% is acceptable for an early read, you can proceed now."
See the difference? One is statistical reporting. The other enables a decision.
The same posterior distribution might need to answer multiple questions:
By keeping the full distribution until Layer 3, you can answer whatever decision question they actually have, not the one you assumed they had.
Scientists may ask statistics questions when they mean decision questions. Learning to translate is a superpower.
Here's your decoder ring:
"Is this significant?" They're not asking about p-values. They're asking: "Should I continue this line of research?"
"What's the confidence?" They don't want credible intervals. They're asking: "How wrong could this decision be?"
"Did it work?" They don't care about effect sizes. They're asking: "Is the effect large enough to matter for my application?"
"Can you check the stats?" They don't want a methods seminar. They're asking: "I need ammunition for my go/no-go meeting tomorrow."
"How robust is this?" They're not necessarily interested in sensitivity analyses. They're asking: "Can I trust this decision?"
Every lab scientist in biotech faces the same five decisions over and over:
Your job isn't to answer their literal question. It's to figure out which of these five decisions they're really trying to make.
🚫 Wrong: Answer the literal statistics question they asked
✅ Right: Answer the decision they're trying to make
Try this: When first asked to partner on an analysis, ask: "What decision are you trying to make with this data?"
Then frame everything around that decision.
Here's a common scenario: A scientist asks the data team to "check if the groups are different." The data scientist could run their standard analysis and report "statistically significant difference detected." Technically correct. Completely useless.
Instead, imagine asking: "What decision does this inform?"
Turns out, they need to know if the new formulation is at least 20% better than the current one — otherwise, it wasn't worth the reformulation costs. The groups were statistically different, but only by 8%. The real answer was: "Don't reformulate."
That's the difference between answering questions and enabling decisions.
Here's the paradox: Most data scientists think trust comes from showing their work.
This is more nuanced than you might think.
Over-explaining methods actually reduces trust because:
What actually builds trust:
🚫 Trust-killing: "Well, it depends on your assumptions about the prior, and if we consider the hierarchical structure of the random effects, controlling for batch-to-batch variation, we can say that under certain conditions..."
This sounds like you're not confident in your answer.
✅ Trust-building: "There's an 89% chance this works. If you need 95% confidence before scaling up, test 3 more concentrations. Here's why I'm confident in that number: I've accounted for batch effects, and even in the worst-case scenario, you're still above 82%."
Clear. Actionable. Confident.
The beauty of probabilistic thinking here: "We're 78% confident" is infinitely clearer than "statistically significant." You can directly answer: "What's the probability we're making the wrong decision?"
That's a question every scientist understands.
The methods appendix approach:
When you do need to establish technical credibility, try this structure:
Smart data scientists keep a technical appendix for every analysis. It has all the details they're proud of—the clever missing data handling, the hierarchical structure, the prior specifications.
But they only show it when asked. And here's what happens: People trust them more because they respect everyone's time enough not to force it on them.
Stop opening with methods. Stop it right now.
Start with their decision.
The structure that works:
Let me show you the difference:
🚫 Wrong meeting flow: "Thanks for coming. So I started by examining the data structure, and I noticed some heteroscedasticity in the residuals, which suggested we might need a more complex variance structure. I tried several approaches, including a Box-Cox transformation, but ultimately settled on a hierarchical model because... [20 minutes later]... so in conclusion, it might work."
By the time you get to the conclusion, they've stopped listening.
✅ Right meeting flow: "We're here to discuss whether to advance Compound X to synthesis. Based on your assay data, there's an 89% probability this compound exceeds your 10nM potency requirement. I recommend proceeding to synthesis scale-up. If you need 95% confidence instead of 89%, I'd recommend testing 3 more concentrations first. Want me to walk through how I got there?"
Notice how the second version:
The email version:
Subject: Compound X: 89% probability of meeting TPP Hi Sarah, **Decision:** Compound X has an 89% probability of meeting your 10nM potency requirement. Recommend proceeding to synthesis. **Key evidence:** - Consistent effect across all three batches - Dose-response curve shows clear relationship - Even worst-case scenario keeps you above 15nM **Next steps:** If you need >95% confidence, test 3 additional concentrations. Otherwise, proceed with synthesis. Technical details in attached appendix if interested. Best, [Your name]
That's it. Decision first. Evidence second. Details optional.
Here's what most data scientists miss: You need to understand their domain as deeply as they do.
The measurement method matters more than your statistical method.
A scientist tells you they're using ELISA to measure protein levels. You nod and proceed with your analysis. But did you ask:
These aren't merely statistical questions — they're also biological questions that determine whether your analysis is even valid.
Be deeply curious about their methods. Ask about:
Learn their terminology. Don't just understand what they're measuring, but understand how they think about it. When they say "potency," do they mean EC50, IC50, or something else? When they talk about "efficacy," are they referring to maximal response, potency, or both?
Quick domain mastery checklist:
Example: A data scientist was analyzing dose-response data from a cell-based assay. The scientist mentioned they were using a "luminescence readout." The data scientist asked about the detection range, learned it was $10^3$ to $10^6$ RLU, and immediately spotted that their highest concentration was saturating the detector. The analysis would have been meaningless without understanding that technical limitation.
The payoff: When you speak their language, you don't just communicate better, you also analyze better. You spot confounders they might miss. You suggest controls they haven't thought of. You become a true collaborator, not just a service provider.
Every scientist has patterns. Learn them.
Your PI always asks about sample size? Pre-calculate the probability of detecting meaningful effects. Your biomarker lead obsesses over false positives? Lead with the posterior probability of true effects. The chemistry team cares about synthesis feasibility? Include yield probabilities from your Bayesian model.
This isn't mind-reading. It's paying attention.
🚫 Reactive approach: Wait for their questions, scramble for answers, promise to "get back to you on that"
✅ Proactive approach: "I know you usually want to know about batch effects, so I checked—they're negligible. Here's how I verified..."
Pattern matching checklist:
Example: One program lead always asks: "What if we're wrong?" Every. Single. Time.
The smartest data scientists now anticipate this and always include: "If we're wrong about this, here's what we'd see in the next experiment. Here's our bail-out plan. Here's the cost of being wrong versus the cost of being slow."
She doesn't ask anymore. She trusts that they've thought it through.
Another scientist always wants to know if we have enough evidence. So the prepared data scientist leads with: "There's an 85% probability that the treatment effect exceeds your minimum meaningful difference."
Pre-answering questions isn't just efficient—it builds massive trust. It shows you understand their concerns and you're thinking ahead. Trust me, this is a career hack!
Most data scientists in biotech spend 80% of their communication budget on methods that their collaborators—brilliant scientists juggling their own complex problems—don't have bandwidth to process.
You're doing the equivalent of giving someone a recipe when they just asked if dinner's ready.
The shift is simple but not easy: Stop defaulting to education mode. Start asking "What decision are you trying to make?" Then translate your sophisticated analysis into the probability they need to make that decision.
This isn't about dumbing down your work. It's about translating between two expert domains—like a diplomat translating between heads of state. The lab scientists you work with have spent years mastering complex biological systems. They need translation, not education.
Your action items:
The best data scientists are the ones whose collaborators make the best decisions.
And that starts with spending your communication budget on what actually matters to the people you're trying to help.
What patterns have you noticed in your collaborations? What questions do your scientists always ask? Let me know; I'd love to hear what's working (or not working) for you!
@article{
ericmjl-2025-how-to-communicate-with-lab-scientists-when-youre-the-data-person,
author = {Eric J. Ma},
title = {How to communicate with lab scientists (when you're the data person)},
year = {2025},
month = {08},
day = {24},
howpublished = {\url{https://ericmjl.github.io}},
journal = {Eric J. Ma's Blog},
url = {https://ericmjl.github.io/blog/2025/8/24/how-to-communicate-with-lab-scientists-when-youre-the-data-person},
}
I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.
If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!
Finally, I do free 30-minute GenAI strategy calls for teams that are looking to leverage GenAI for maximum impact. Consider booking a call on Calendly if you're interested!