The Macro: Mathematics Is the Bottleneck Nobody Talks About
There is a quiet crisis in the fields that depend on mathematical proof. Quantum computing researchers need to verify that their error correction codes actually work. Financial institutions need to prove that their pricing models are correct under all conditions. Cryptographers need mathematical guarantees that their protocols are unbreakable. In each of these domains, the difference between “we think this is right” and “we can prove this is right” is the difference between shipping and hoping.
Formal verification is the discipline of using mathematical proof to guarantee that a system behaves correctly. It has been around for decades, primarily in hardware design and safety-critical software. Intel uses formal verification to avoid another Pentium FDIV bug. Airbus uses it for flight control software. But the application has been narrow because formal verification is extraordinarily labor-intensive. A single proof can take a team of specialists months to construct.
The tools for formal verification have gotten better. Lean 4, developed at Microsoft Research, provides a modern framework for writing and checking proofs. Coq and Isabelle are older alternatives with strong academic communities. But even with better tools, the bottleneck is human expertise. There are only a handful of people in the world who can construct complex formal proofs in domains like quantum computing or financial mathematics. The supply of human proof engineers does not come close to meeting the demand.
AI has been making progress on mathematical reasoning. DeepMind’s AlphaProof and AlphaGeometry demonstrated that AI systems can solve competition-level math problems. But translating competition math into production formal verification for real-world applications is a different challenge entirely. The gap between “solve an IMO problem” and “verify a quantum error correction code” is substantial.
The Micro: Multi-Agent Proof Discovery Grounded in Truth
Cajal is scaling formal verification by deploying what they call superhuman AI mathematicians to high-impact applied domains. They start with quantum computing and finance, two fields where mathematical certainty directly translates to either physical correctness or financial accuracy. The company uses Lean, the formal verification framework that allows them to verify any mathematical statement, grounding AI in truth and validating the tools discovered by their systems. They went through Y Combinator’s W26 batch.
The founders are Luke Johnston and Pedro Nobre. Luke has a background in machine learning and neuroscience from research labs at Oxford, Cambridge, and UCL. Pedro’s background is in formal verification and AI. The company is backed by Y Combinator and the University of Cambridge.
Their core system is called Tau, a multi-agent system that collaborates to discover and verify mathematical proofs. Multi-agent is the key word here. Rather than a single model trying to find a proof from start to finish, Tau uses multiple agents that work together, presumably with different agents handling different aspects of proof search, strategy, and verification. The verification step is critical because Lean provides a ground truth check. If the proof is wrong, Lean rejects it. There is no hallucination in formal verification.
The product includes verified training corpora for Lean 4, Coq, and Isabelle. This is interesting because it suggests Cajal is not just building a proof-finding system but also generating the training data that makes better proof-finding systems possible. Rigorous benchmarks with Pass@k metrics provide standardized evaluation. And RL environments with native proof assistant bindings give their agents a training loop directly connected to the verification tools.
The application domains they highlight are quantum computing, finance, cryptography, aerospace, robotics, and biology. That is an ambitious scope for a two-person team, but the underlying technology, if it works, is horizontal. A system that can discover and verify proofs in Lean should be applicable across any domain where the problem can be expressed formally.
The competitive field is small but notable. Mechanized Math (Numina) has been working on AI for mathematical reasoning. LeanDojo provides tooling for AI-driven formal verification. Several academic groups at MIT, Stanford, and DeepMind are active in this space. But few are focusing specifically on commercial applications in quantum computing and finance, which is where Cajal has planted its flag.
The Verdict
Formal verification is one of those fields where AI assistance could produce genuine breakthroughs rather than incremental improvements. If Tau can discover proofs that human mathematicians would take months to construct, the time savings alone justify the product. The fact that Lean provides absolute verification means the outputs are trustworthy in a way that most AI outputs are not.
At 30 days, I would want to see concrete results on specific verification tasks. Can Tau verify a quantum error correction code faster than a human team? By how much?
At 60 days, the question is customer engagement. Are quantum computing companies and financial institutions actually using Cajal for production verification, or is this still in research partnership mode?
At 90 days, I would be looking at the training data flywheel. Every proof Tau discovers becomes training data for discovering the next proof. If this flywheel is spinning, Cajal gets better at an accelerating rate. That is a genuine moat.
Mathematical proof is the one domain where AI cannot fake it. Either the proof checks or it does not. That binary clarity makes Cajal’s progress easier to evaluate than almost any other AI company I have covered. I will be watching the benchmarks closely.