Thesis Wants to Replace Your ML Research Team With AI Agents, and Stanford Is Already Using It

The Macro: ML Research Is Expensive, Slow, and Gatekept by Headcount

I have watched friends in ML research spend three months on a project that, in retrospect, could have been scoped and executed in a week if they had known which approaches would fail. That is the fundamental problem with machine learning research. The iteration loop is brutally long. You pick an architecture, preprocess your data, train a model, evaluate it, realize it does not work, change something, and start over. Each cycle takes hours to days depending on your compute budget. Multiply by dozens of experiments and you are looking at months of work before you have anything publishable.

The tooling has gotten better. Weights and Biases made experiment tracking tolerable. Hugging Face made model access democratic. Lightning made PyTorch less painful. But the core workflow, the actual research loop of hypothesis, experiment, evaluate, iterate, is still entirely manual. A researcher has to make every decision about what to try next, and those decisions require deep expertise that takes years to develop.

This matters because the demand for ML models is exploding while the supply of ML researchers is not. Every healthcare company wants to build predictive models. Every financial institution needs fraud detection. Every manufacturing operation wants quality control AI. But hiring an ML team is expensive, slow, and competitive. Senior ML researchers command $400K+ salaries and there are not enough of them. The bottleneck is not compute or data. It is human expertise.

AutoML tried to solve this five years ago and mostly failed. Tools like Google Cloud AutoML and H2O.ai automated the model selection step but left everything else manual. They were too narrow. They could tune hyperparameters but could not do the creative, strategic parts of ML research, like figuring out that your dataset has a distribution shift that is causing your model to fail on production data. The next generation of ML automation needs to handle the full research loop, not just one step.

The Micro: Brothers From Stanford and Columbia Who Have Done This Before

Thesis is an agentic AI environment that runs an ML research team for you. The platform handles the full pipeline. Exploratory data analysis to uncover patterns, end-to-end model building including data ingestion, model selection, optimization, training, and evaluation. It has a self-healing execution system that monitors runs continuously, diagnoses issues, and applies fixes automatically. You bring a dataset and a question. Thesis brings the research team.

Sergio Charles and Luigi Charles are brothers and cofounders. Sergio comes from AI R&D at Google X, NVIDIA, and Stanford’s AI Lab, where he worked with Andrew Ng and Chelsea Finn. He has published at NeurIPS and ICML, which are the two most competitive ML conferences in the world. His credentials are a B.S. in Math and CS plus an M.S. in Statistics, both from Stanford. Luigi is a second-time founder. He previously built Sphere, a fintech company that processed billions in global payments and was valued at over $250 million at Series A. He has a B.S. in Mathematics and Computer Science from Columbia.

That combination is almost unfairly well-suited to this problem. Sergio has the deep ML research credibility to build a system that real researchers will trust. Luigi has the operational and business experience to turn a research tool into a company. They came through Y Combinator’s Fall 2025 batch.

The product is already trusted by researchers at Stanford, Harvard, MIT, and Mayo Clinic. Those are not easy logos to earn in the research tool space. Academics are skeptical by default and will not use a tool that produces results they cannot verify or reproduce. The fact that Thesis has penetrated those institutions this early is a strong signal.

The competitive landscape includes Weights and Biases for experiment tracking, Comet ML for model management, and a handful of AutoML tools that handle narrow slices of the pipeline. None of them are attempting what Thesis is doing, which is full agentic research. The closest comparison might be Determined AI, now owned by Hewlett Packard Enterprise, but that product focuses on training infrastructure rather than research strategy. Thesis is trying to replicate the judgment of an experienced ML researcher, not just the compute.

The Mac download and web app availability suggest they are going for individual researchers first, not enterprise procurement. That is the right sequence. Build love with individual users, let them bring it into their organizations, then sell the enterprise version with collaboration and compliance features.

The Verdict

I think Thesis is one of the most ambitious ML tools I have seen launch this year. The promise of compressing months of ML research into hours is not new, but the team behind this one has the credibility to actually deliver on it. Sergio’s publication record and research lab experience mean he understands what “good enough” looks like for an automated research system, which is the hardest design problem in this space.

The risk is trust. Researchers need to understand why a model works, not just that it works. If Thesis produces a model that performs well but the researcher cannot explain the decisions that led to it, that model will not make it into a paper or a production system. Explainability and reproducibility are not features. They are requirements.

At 30 days, I want to see how many of those institutional users are repeat users versus one-time evaluators. At 60 days, the question is whether Thesis can handle messy, real-world datasets or only works well on clean benchmarks. Every ML tool looks great on Kaggle data. The test is whether it can handle the ugly, missing-value, mislabeled, distribution-shifted data that researchers actually work with. At 90 days, I want to see if any published papers cite Thesis as part of their methodology. That would be the strongest possible validation that the tool is producing research-grade results, not just demo-grade results.

Thesis Wants to Replace Your ML Research Team With AI Agents, and Stanford Is Already Using It

The Macro: ML Research Is Expensive, Slow, and Gatekept by Headcount

The Micro: Brothers From Stanford and Columbia Who Have Done This Before

The Verdict

More on this