The Macro: Nobody Trusts Their Company’s Data
Every data team has the same nightmare scenario. The CEO opens a dashboard, sees a number that looks wrong, and asks “can I trust this data?” The answer, more often than anyone wants to admit, is “not entirely.”
Data quality problems are endemic in modern data stacks. Pipelines break silently. Schema changes upstream cascade into incorrect downstream calculations. Two teams define “monthly active users” differently and produce conflicting numbers. NULL values sneak into critical columns. By the time anyone notices, decisions have been made on bad data.
The existing data quality tools like Monte Carlo, Anomalo, and Great Expectations provide monitoring and alerting. They tell you when a metric looks anomalous or when a test fails. But they are reactive: they detect problems after bad data has already propagated through the stack. And they require significant setup, defining tests, configuring thresholds, and maintaining rules as the data stack evolves.
Velum Labs, backed by Y Combinator, takes a different approach. They call their product a “semantic control plane for data” that monitors query patterns, detects divergences automatically, traces root causes through query lineage, and generates data contracts to prevent future issues.
The Micro: Finding Problems Nobody Knew to Look For
Benjamin Munoz-Cerro (CEO) studied physics and mathematics at Stanford with 5 years in reinforcement learning methods for quantum computing. Alen Rubilar-Munoz (CTO) is a mathematician and ML engineer with a background in geometric deep learning and analog computing. Both have research affiliations with Harvard, Stanford, and the Max Planck Institute.
The technical approach is interesting. Instead of requiring users to define data quality rules manually, Velum monitors production query traffic and automatically detects when teams calculate the same metric differently. This catches semantic inconsistencies that no predefined test would find. If the marketing team’s definition of “revenue” includes refunds and the finance team’s does not, Velum spots the divergence.
When an issue is detected, Velum traces through production query traffic to find the root cause without requiring manual documentation of the data lineage. Then it deploys fixes through existing workflows like Git, dbt, and CI/CD pipelines. Finally, it generates data contracts from real problems, creating enforceable rules that prevent the same issue from recurring.
This “contracts from real problems” approach is pragmatic. Most data contract initiatives fail because they start with abstract schema definitions rather than real issues. Velum flips this: find the problems first, then generate the contracts that would have prevented them.
Competitors include Monte Carlo (data observability), Soda (data quality testing), and Atlan (data governance). Velum’s differentiator is the automatic detection of semantic inconsistencies across teams rather than relying on predefined rules.
The Verdict
Velum Labs is tackling data trust, which is arguably the most important unsolved problem in data engineering. If you cannot trust your data, everything built on top of it is suspect.
At 30 days: how many semantic inconsistencies is Velum finding in customer data stacks that existing monitoring tools missed?
At 60 days: are the auto-generated data contracts being adopted by data teams, or do they require significant modification?
At 90 days: is Velum reducing the time data teams spend on reactive firefighting and incident response?
I think the approach is smart. Starting from real query traffic rather than abstract rules sidesteps the biggest failure mode of data quality tools: setup fatigue. If Velum can automatically find the problems that matter and generate the fixes, data teams will adopt it because it makes their lives measurably better.