← April 9, 2027 edition

strand-ai

Curated multimodal datasets for biology AI

Strand AI Predicts the Patient Data That Pharma Companies Never Collected

HealthcareAIBiotech

The Macro: Clinical Trials Fail Because the Data Is Incomplete

Ninety-six percent of oncology trials fail. That is not a typo. The vast majority of cancer drug trials do not lead to approved treatments. There are many reasons for this, but one of the biggest is patient selection. If you enroll the wrong patients in a trial, the drug might actually work but the trial results will not show it. The biomarkers that would identify the right patients often exist in data modalities that were never collected.

This is the curse of modern biology. You can measure a patient’s genome. You can analyze their tissue with spatial proteomics. You can run transcriptomics. But you almost never have all of these modalities for the same patient. Data is expensive to collect, samples degrade, and cohorts are assembled opportunistically. The result is sparse, incomplete datasets where the signal gets lost in the gaps.

Existing tools handle this poorly. Most computational biology platforms work with the data you have and ignore what you do not. Some use basic imputation methods that are not sophisticated enough for complex biological relationships. The idea of generating high-quality predictions for unmeasured modalities is relatively new, and the models to do it credibly have only recently become feasible.

The Micro: Foundation Models That Fill in Biology’s Blank Spots

Yue Dai and Oded Falik cofounded Strand AI. Yue was previously at Pathos AI, Enable Medicine, and Element AI, specializing in foundation models for biology. Oded led product development for spatial biology platforms at Enable Medicine. Both have deep expertise in the specific intersection of ML and biological data. They are a two-person team from YC Winter 2026 with partner Jon Xu.

The product generates missing biological data modalities from existing samples. If you have H&E slides and genotypes for a patient cohort, Strand can predict what the proteomics and transcriptomics would look like. This lets pharma companies rescue incomplete cohorts instead of discarding patients who are missing data. It also lets them predict expensive assays from cheaper data they already have in inventory.

Their core technology, developed in collaboration with DeepMind, produced Cell2Sentence, published at ICML. They have also launched POSTMAN, a spatial proteomics prediction model. The research credentials are strong, and the fact that they are already working with Top 10 pharma companies with 100% inbound deals tells me the demand side is not a question.

The platform lives at app.strandai.com, suggesting they have a real product that customers can use, not just a research prototype.

The Verdict

Strand AI is one of those companies where the technical depth of the founding team matches the difficulty of the problem. Predicting biological data modalities is hard. Getting it wrong has real consequences for drug development. But if they get it right, the value creation is enormous. Shaving months off a drug launch timeline is worth tens or hundreds of millions of dollars to a pharma company.

The competitive set includes Recursion Pharmaceuticals, Insitro, and other AI-driven drug discovery companies. But most of these are building their own drugs. Strand is building infrastructure that pharma companies use to improve their own pipelines. That is a different business model with potentially broader applicability.

In 30 days, I want to see validation results. How accurately does Strand predict held-out biological data? In 60 days, the question is commercial traction. How many pharma partnerships are generating revenue? In 90 days, I want to know whether any clinical trial has been designed differently because of Strand’s predictions. That is the ultimate proof point.