← September 8, 2026 edition

caddy

Voice interface that works proactively

Caddy Thinks Voice Is the Next Operating System, and I Think They Might Be Right

The Macro: Voice Assistants Have Been a Punchline for Too Long

I have tried every voice assistant that has launched in the last five years. Every single one. And I have abandoned every single one within a week. The pattern is always the same: impressive demo, workable first session, then the friction compounds until I am back to typing because it is just faster.

The problem is not that voice recognition got worse. It got dramatically better. The problem is that voice assistants were designed as command interfaces, not workflow interfaces. You speak, it does one thing, you speak again, it does another thing. That is not how anyone actually works. Real productivity is contextual. You are bouncing between Slack and email and Linear and Notion, and the value of any assistant depends on whether it understands the thread you are pulling across all of those tools.

Siri is the obvious punching bag here, but the real failures are more instructive. Otter.ai built an incredible transcription engine but never figured out how to turn transcripts into actions. Krisp solved the noise cancellation problem beautifully and then stalled. Voice.ai went the consumer entertainment route. None of them cracked the fundamental challenge: making voice an actual productivity multiplier instead of a novelty.

The tools that have succeeded in AI-assisted productivity are all text-based. Cursor for code. Notion AI for docs. Linear’s AI triage. They work because they are embedded in the workflow, not layered on top of it. The question for voice is whether anyone can build something that feels equally native to how people actually get things done.

The market opportunity is enormous if the execution is right. Knowledge workers spend roughly 40 percent of their day on task switching and context management. That is not a made-up number. It is a real, measurable drain, and voice is theoretically the fastest input method available. The gap between the theoretical value and the actual delivered value of voice assistants is the entire opportunity.

The Micro: Two Loom Veterans Who Actually Shipped AI Products

Caddy is an operating system that learns how you work and acts on your behalf. It runs in two modes. Action Mode takes voice commands and executes tasks across email, calendar, Slack, Linear, and Notion. Dictation Mode is voice-to-text that adapts to your style. The key differentiator is the proactive part. Caddy is not just waiting for commands. It is learning your patterns and surfacing actions before you ask for them.

Connor Waslo and Rajiv Sancheti are both ex-Loom, where they spent four years each. Connor led product for Loom’s AI Suite and Monetization team. Rajiv led design for the AI suite, with prior experience at Airbnb and as a Kleiner Perkins Fellow. That Loom background is directly relevant. Loom was one of the first companies to figure out how to make async video feel like a natural part of work, not a disruption. Understanding how to embed a new modality into existing workflows is exactly the skill set you need to make voice work in productivity.

They are based in New York and came through Y Combinator’s Fall 2025 batch. The product is currently in waitlist mode, which means they are being selective about early users rather than chasing vanity signups. That is the right call for a product that needs to learn individual work patterns before it can deliver on its promise.

The competitive landscape is interesting because there is not really a direct competitor doing what Caddy is doing. Whisper Flow and similar tools handle voice-to-text transcription. Granola does AI meeting notes. Raycast and Alfred handle keyboard-driven automation. But nobody is combining voice input with proactive workflow automation across multiple work tools. That is either a signal that the market is wide open or a signal that it is harder than it looks. Probably both.

I am curious about the learning curve. A voice interface that learns your patterns is only useful if it can learn fast enough to deliver value before the user gives up and goes back to their keyboard. The cold start problem for personalized AI tools is real. If Caddy takes two weeks to become useful, most people will never get there. If it takes two days, that changes everything.

The Verdict

I think Caddy has the right team for this problem. Most voice assistant startups are built by audio engineers or NLP researchers who have never shipped a productivity product. Connor and Rajiv have done the opposite. They built AI features inside a productivity tool that millions of people actually used. That experience is hard to replicate.

The risk is timing. Voice interfaces require a behavior change, and behavior changes in productivity software take longer than anyone expects. Slack took years to displace email for internal communication, and email still won. Notion took years to displace Google Docs, and Google Docs is still dominant. Even if Caddy works perfectly, the adoption curve could be painfully slow.

At 30 days, I want to see how quickly the system learns individual work patterns and starts surfacing useful proactive suggestions. At 60 days, the question is whether users are actually replacing keyboard workflows with voice or just using voice as a supplement. At 90 days, I want retention data segmented by how many integrations each user has connected. My bet is that Caddy’s value compounds with integration depth, and users with four or more connected tools will retain at dramatically higher rates than users with one or two. If that curve is steep enough, the product sells itself through progressive lock-in.