Putting AI to Work in Drug Discovery

1. Predict a property, then decide

The first and most familiar pattern of problems is: given a biological entity, give me a number. Given a small molecule, predict solubility, logP, or permeability. Given a protein sequence or a mutation, predict stability, binding, or pathogenicity. That number then feeds downstream decision-making: developability, prioritization, which variant to carry forward, which compound to synthesize next. The ML problem is deliberately narrow—input in, number out—so that the decision stays in human hands, informed by the estimate and, when we do it right, by its uncertainty too.

Developability, stabilization, prioritization—they all sit on this same pattern. We are not generating new molecules or designing experiments from scratch in one shot; we are scoring candidates so that people can rank, filter, and choose. That keeps the framing clean and makes it easier to communicate uncertainty (e.g. point estimate ± interval) and to iterate as new data arrives.

Examples:

Molecule property prediction — solubility, logP, permeability, toxicity; given a small molecule (e.g. SMILES), output a number for developability or prioritization.
Protein property prediction — stability, binding affinity, pathogenicity; given a sequence or a mutation (e.g. variant), output a number to decide which variant to carry forward or which construct to express.
Similar “entity → number” setups apply to trajectory → risk (e.g. ctDNA over time), table → outcome (compound/assay features → P(advance)), and image → label (e.g. digital pathology). In each case the ML gives a score or distribution that feeds the decision.

Molecule property prediction: log P, solubility, pKa

Molecule

Solid = measured; crosshatch = predicted

Example training data for log P, solubility, pKa models

SMILES	log P	Solubility (mg/mL)	pKa
`CC(=O)Oc1ccccc1C(=O)O`	1.2	3.0	3.5
`CC(C)Cc1ccc(cc1)C(C)C(O)=O`	3.97	0.021	4.4
`CC(=O)Nc1ccc(O)cc1`	0.5	14	9.5
`CN1C=NC2=C1C(=O)N(C(=O)N2C)C`	-0.07	21.7	14
`c1ccccc1`	2.1	1.8	—
`CC(C)(C)c1ccc(cc1)O`	3.2	0.05	10.2
`CN(C)C(=N)N=C(N)N`	-1.0	150	12.5
`Clc1ccc(cc1)C(=O)O`	2.9	0.5	3.1

2. Inverse problems: hard to solve, easy to simulate

A second, distinct pattern of problems is inverse scientific problems: we have a process that is easy to simulate in the forward direction but hard to invert. Given the observed output, recover the inputs or parameters that produced it. The equations that describe the process are known (or well approximated), but solving them backwards is computationally costly or analytically intractable. Machine learning can learn that inverse mapping or approximate the solution at a fraction of the cost.

Chromatography is a concrete example. A chromatogram can be modeled as a mixture of peaks—e.g. skewed Gaussians—with parameters (position, width, height, skew). Forward simulation—given parameters, produce the trace—is straightforward. The inverse problem—decompose an observed chromatogram into those parameters (peak integration, baseline correction)—is what analysts do by hand or with heuristics. Doing it well is a huge operational efficiency win: faster, more consistent, and less subjective. It does not directly “make the go/no-go decision” for a compound, but it makes the pipeline that feeds that decision more reliable and scalable.

Chromatography demo: human integration vs model (mixture of skew Gaussians)

Next click

Peak 1 —

Peak 2 —

Peak 3 —

Draw two lines to split the chromatogram into three peaks. Where you place them changes the percentages—showing how arbitrary human integration is.

Decomposed components (mixture of skew Gaussians) — parameters are interpretable:

Peak	μ (ret. time)	σ (width)	Skew (α)	Weight
1	1.85	0.12	1.2	0.48
2	2.45	0.18	-0.8	0.32
3	3.05	0.14	0.5	0.20

Human integration: vertical lines and baseline don't account for tailing, overlap, or skew. Subjective and inconsistent.

Structure prediction is another inverse problem: going from sequence → structure. The forward direction—given a 3D structure, derive the sequence—is straightforward (we can read residues from coordinates). The inverse—given a sequence, predict the 3D structure—is hard; tools like AlphaFold learn that mapping. Below, a few nanobodies from the PDB: sequence on the left, experimental structure on the right.

Structure prediction: sequence → structure (nanobodies from PDB)

Nanobody

Neural networks for force fields and molecular simulation belong in the same inverse / “hard to solve, easy to simulate” family. Replacing or augmenting classical force fields with learned potentials lets molecular dynamics (MD) and other simulations run faster or capture effects that traditional parametrizations miss. Forward simulation (given positions, compute energy or forces) is well defined; the inverse or the high-fidelity mapping is expensive. Learned force fields approximate that mapping. This is an active area of research—more accurate and transferable learned force fields would accelerate structure-based design, binding estimation, and conformational sampling—and has not yet reached the same level of routine deployment as property prediction or chromatography-style inverse problems. The goal is again operational efficiency: same scientific questions, less compute or better accuracy per dollar.

3. Generative design: coming up with molecules

A third pattern of problems is generative design: using models to propose new molecules, sequences, or candidates. Instead of “given this entity, give me a number,” we ask “give me an entity that satisfies these constraints or objectives.” Generative models for small molecules (e.g. SMILES, graphs, 3D), for proteins, or for chemical reactions open the door to exploring vast spaces that humans cannot enumerate. They can be combined with property predictors and optimization (e.g. Bayesian optimization, genetic algorithms) to suggest what to make or try next.

So the pattern here is different from “entity → number” or “invert a process”: the output is a structure, not a single number, and evaluation typically requires downstream assays or simulations. Validation, novelty, synthesizability, and safety are central. Generative AI for drug discovery is a fast-moving area; it complements the “predict a property” and “invert a process” use cases by expanding the candidate set that those tools then score and prioritize.

Generative design: diffusion — unfold → fold

Protein diffusion animation: unfold (noise), then fold back to structure

Animation: Institute for Protein Design, University of Washington.

4. Clustering patients

Another useful pattern of problems is high-dimensional profiles → partition: patient profiles (biomarkers, gene expression, imaging-derived features) are grouped into subtypes or clusters to stratify for trial design, identify at-risk groups, or discover phenotypes. Unsupervised methods (e.g. k-means, hierarchical clustering, or learned embeddings) reduce many features to a few interpretable groups. The result is not a single prediction per patient but a partition of the cohort that supports downstream decisions: who gets which therapy, how to stratify randomization, or which subgroup to analyze first.

Patient clustering: high-dim profiles → subtypes

Mock 2D embedding of patients (e.g. from PCA or UMAP on biomarkers); points colored by assigned subtype. In practice, cluster count and interpretation are driven by biology and study design.

5. Imaging

A fifth pattern of problems is image → structure or label: extract objects, segments, or classifications from images to support decisions. Cell painting is a high-content image-based assay for morphological profiling: multiplexed fluorescent dyes highlight nuclei, cytoplasm, and organelles, and segmentation identifies individual cells and compartments for feature extraction. Running such models in the browser keeps data local and avoids installation. Below, Segment Anything Model 2 (SAM 2) runs entirely in the browser via ONNX Runtime Web (WASM): click to add positive (include) or negative (exclude) points and see the segmentation mask update in real time.

Cell painting & segmentation: SAM 2 in the browser

Loading SAM 2 (WASM)…

Next click:

Part 2: Partnering with quantitative colleagues

Getting the most out of ML in drug discovery depends on how biologists, chemists, and clinicians work with the people building the models. Recognizing which pattern of problem you’re in (from Part 1) helps you scope the ask; the habits below make the collaboration itself faster and more impactful.

6. Bring the data first

Describing a dataset in plain English is slow and error-prone. “We have binding affinities for several variants, and expression levels from the last run, plus some notes on which constructs failed” leaves your quant colleague guessing at column names, units, and structure. A small sample table removes that ambiguity in seconds.

There is a deeper reason to show the data early: the quant needs to see the shape of the data—how many rows, what columns, how sparse or complete, what the values look like—to form an intuition for whether the ask is feasible. An experienced quant can often tell from a quick look whether a property prediction, clustering, or inverse problem is tractable with the data you have, or where the gaps and risks are. Words don’t convey shape; a few rows do.

Describe in words vs. show the table

Describing the same data in prose:

“We measured binding affinity (Kd in nM) for eight nanobody variants. Variant A had the best affinity at 0.3 nM, then B at 1.2, C at 2.1. We also have expression levels from the HEK run—A and B were high, C was medium. Variants D and E had poor expression and we didn’t get reliable Kd. The rest are in the table.”

Which variant is “B”? What exactly is “high” expression? Your quant is already mentally drawing the table.

The same information as a table

Variant	Sequence	Kd (nM)	Expression	Notes
A	`EVQLVESGGGLVQPGGSLRL`	0.3	High	—
B	`EVQLVESGGGLVQPGGSLRV`	1.2	High	—
C	`EVQLVESGGGLVQPGGSLRI`	2.1	Medium	—
D	`EVQLVESGGGLVQPGGSLRA`	—	Low	No reliable Kd
E	`EVQLVESGGGLVQPGGSLRG`	—	Low	No reliable Kd
F	`EVQLVESGGGLVQPGGSLRN`	4.0	Medium	—
G	`EVQLVESGGGLVQPGGSLRQ`	8.1	High	—
H	`EVQLVESGGGLVQPGGSLRP`	12.0	Low	—

Bring sample data on the first conversation, not the third. A few rows (even anonymized or synthetic) clarify the question and speed up scoping.

7. Teach the biology—they want to learn

Your quant colleague is not “just a modeler.” They are often keen to understand the biology and chemistry: how experiments are designed, how decisions move from bench to clinic, and where the data actually come from. That context is not optional. It is what lets them spot confounders that would poison a model—for example, domain shift caused by a change that seems minor to a biologist but shifts the data distribution.

Example: cell line change = domain shift for ML

Switching from HEK-293 to HEK-293T (T antigen) changes protein expression patterns. A model trained on 293 data may perform poorly on 293T data—not because the biology is wrong, but because the input distribution changed.

Share the full picture: experimental lifecycle, how go/no-go decisions are made, reagent or protocol changes over time. What feels like “too much detail” is often exactly what’s needed to avoid training on confounded data.

8. Frame for impact, not curiosity

Quants want to have impact, not just turn the crank on curiosity. Come having thought through how your ask connects to a real decision: What would we do differently if we had this model? What would “good enough” look like? The patterns of problems above—property prediction, inverse problems, generative design, patient clustering, imaging—are useful precisely because they recur: once you recognize which pattern you’re in, you can pattern-match your ask, tie it to a concrete decision, and scope the work before the first meeting.

Click each card to reveal an example.

Curiosity ask

Click to reveal example

“Can you predict binding affinity from sequence?”

Vague. No decision, no success criteria, no resource context.

Impact ask

Click to reveal example

“We need to triage 200 variants down to 20 for expression. Can we predict Kd well enough to rank-order candidates? A 2× enrichment over random would save us about three weeks of screening.”

Clear decision, success criterion, and why it matters.

Think through how your ask impacts decision-making. Your quant partner wants to deliver something that changes what you do next—not just a one-off analysis.