Mosaic Thinks Video Editing Should Take Seconds, Not Hours

The Macro: Video Editing Is Stuck in 2015

Here’s a number that should bother anyone building video tools: the average YouTube creator spends 6 to 10 hours editing a single video. For podcasters doing video, it’s worse. For short-form content creators trying to hit daily upload schedules across TikTok, YouTube Shorts, and Instagram Reels, editing is the bottleneck that determines whether you burn out in month three or month six.

The existing tools are better than they used to be, but they’re still fundamentally manual. Adobe Premiere is powerful and expensive and has a learning curve that could be classified as hostile. Descript made transcription-based editing work, which was clever, but it’s still a tool you sit in front of and operate. CapCut is free and popular and perfectly adequate for simple cuts. Runway has done impressive work on generative video, but that’s a different problem. Generation is not editing.

The gap is in the middle. Between “I have raw footage” and “I have a finished video,” there’s a pile of tedious, repetitive work. Jump cuts. B-roll placement. Audio leveling. Transitions. Captioning. Color correction. Each of these is individually solvable. Collectively, they eat hours. The question is whether AI agents can actually handle multi-step editing workflows, or whether this is another case of demos that look great and products that disappoint.

The Micro: Tesla Alumni With a Canvas

Mosaic is a canvas-based video editing platform where you build and run AI agents that do the editing for you. The key word there is “agentic.” This isn’t an AI feature bolted onto a timeline editor. The pitch is that you describe what you want, and multimodal agents execute the edit across multiple steps.

I find the canvas approach interesting. Most AI video tools treat the AI as an assistant inside a traditional editor. Mosaic is treating the AI as the editor, with the human directing from a higher level of abstraction. That’s a real architectural difference, not just a marketing one.

The team has the right pedigree for this kind of work. CEO Adish Jain and CTO Kyle Wade both came from Tesla, where the video and perception problems are, to put it mildly, non-trivial. Jain also did time at AWS and studied at Berkeley. Wade was an ML researcher at UCSD before Tesla. They’re a three-person team in San Francisco, Winter 2025 YC batch. The backgrounds suggest they understand multimodal models and real-time processing at a level that most video editing startups don’t.

The competitive positioning is tricky, though. Descript has significant market share and just keeps adding AI features. Runway is well-funded and technically impressive. CapCut is free and backed by ByteDance’s resources. Adobe is Adobe. Mosaic needs to be dramatically better at the specific thing it does, because “a little better” won’t pull people away from tools they already know.

The Verdict

Mosaic is tackling a real problem with a genuinely different approach. The agentic model for video editing makes more sense to me than the “AI assistant in a traditional editor” approach that most competitors are taking. If you can actually describe a complex edit and have it executed correctly, that’s a step change in productivity, not an incremental improvement.

The risk is reliability. Video editing is unforgiving. If an AI agent handles 90% of an edit correctly but botches the audio sync or makes a weird jump cut, the creator still has to go in and fix it manually. And now they’re fixing it in an unfamiliar tool instead of the timeline editor they already know. The tolerance for errors in video editing is essentially zero, because every error is visible to the audience.

Over the next 30 days, I’d look for creator testimonials from real production workflows, not demo videos. At 60 days, the question is whether Mosaic can handle long-form content (10+ minutes) with the same reliability as short clips. By 90 days, the competitive response from Descript and Runway will tell us a lot about whether the agentic approach is a genuine moat or a feature that gets absorbed. I’m cautiously optimistic. The team is strong, the approach is differentiated, and the pain point is obvious. But video is a hard, hard product to get right.

Mosaic Thinks Video Editing Should Take Seconds, Not Hours

The Macro: Video Editing Is Stuck in 2015

The Micro: Tesla Alumni With a Canvas

The Verdict

More on this