← March 26, 2026 edition

voiceos

The Siri for Productivity

VoiceOS Thinks You Should Stop Typing and Start Talking

ProductivityAIConsumerVoice

The Macro: Voice Keeps Getting a Second Chance

Voice interfaces have been the “next big thing” for about fifteen years running. Siri launched in 2011 and trained an entire generation to lower their expectations. Google Assistant got smarter but never felt personal. Alexa became a kitchen timer that occasionally orders things you did not ask for. The promise of talking to your computer and having it actually understand you has been perpetually five years away.

What changed is the underlying AI. Large language models can now parse intent, fix grammar, remove filler words, and understand context in ways that were not possible even two years ago. The voice recognition itself was already good. The missing piece was the intelligence sitting behind it, figuring out what you actually meant versus what you literally said.

The productivity voice space is starting to get crowded. Wispr Flow has built a dedicated following. Otter.ai handles meeting transcription well. Lemon recently launched with a single-key trigger approach. We covered Lemon’s bet on simplifying the voice interaction model, and the thesis is similar: people want fewer surfaces, not more features.

The market opportunity is substantial. Knowledge workers spend hours typing things they could say in minutes. The gap between speaking speed (roughly 150 words per minute) and typing speed (roughly 40 words per minute for most people) is a productivity loss that compounds across every email, every Slack message, every document draft. The question is whether any voice tool can overcome the social awkwardness of talking to your laptop in an open office.

The Micro: Two Modes, One Interface

VoiceOS was founded by Jonah Daian and Kai Brokering in San Francisco. They went through Y Combinator’s Spring 2025 batch and have been shipping fast.

The product has two core modes. Dictation Mode writes what you meant, not what you said. It strips filler words, fixes grammar, and produces clean text. Ask Mode lets you give instructions like “reply that I cannot make it but ask to reschedule” and VoiceOS composes the response for you. Both modes work across more than 100 languages with automatic detection, which is a meaningful feature for anyone who regularly switches between languages.

The integration list is impressive for an early-stage product. Gmail, Slack, Notion, VS Code, ChatGPT, Claude, Figma, and more. The claim is that it works across any app on your computer, and the compatibility list suggests they are genuinely trying to deliver on that rather than cherry-picking a few popular targets.

Pricing is transparent and reasonable. Free tier gets you 100 uses per week. Pro is $12 per month billed annually with unlimited usage. Enterprise gets custom pricing with SOC 2 Type II and ISO 27001 compliance, SSO, and zero data retention. The privacy angle is worth noting: audio is processed locally and never stored on their servers unless you choose to share it. For anyone handling sensitive work, that is a real differentiator.

The style adaptation feature is clever. VoiceOS adjusts tone based on context, so dictating into a formal email sounds different from dictating into a casual Slack thread. That kind of contextual awareness separates useful voice tools from glorified transcription.

The site is polished, the product is live, and they have blog posts running through March 2026. This is not a landing page with a waitlist. It is a shipping product.

The Verdict

I like VoiceOS because they have made specific product decisions instead of trying to be everything. The two-mode system is clear. The pricing is honest. The privacy-first approach is smart. The local audio processing is the kind of architectural choice that matters more as companies get serious about data security.

The challenge is differentiation in an increasingly noisy space. Wispr Flow is real competition. Apple and Google will inevitably improve their native voice tools. And the behavioral barrier is significant: most people have been trained by years of mediocre voice interfaces to just keep typing.

In thirty days, I want to see conversion rates from free to Pro. Sixty days, whether enterprise customers are signing up or just kicking tires. Ninety days, the question is retention. Voice tools have a pattern where people try them enthusiastically for a week and then quietly go back to typing. If VoiceOS can break that cycle, they have something. The product feels ready. The question is whether the market is.