Lucid Is Building a Universe Simulator, and Their Video Model Runs at 20 FPS

The Macro: Video Models Are Getting Fast, but Nobody Can Play Inside Them Yet

Generative video has been one of the most visible AI breakthroughs of the past two years. Runway’s Gen-3, Pika, Kling, and Sora showed the world that AI can produce video from text prompts. The results went from “interesting but obviously fake” to “wait, that’s not real?” in about eighteen months. The market for AI video generation is projected to grow past $2 billion by 2027, and it’s attracting serious investment from both startups and research labs.

But there’s a fundamental limitation that none of these tools address: the video is passive. You type a prompt, the model generates a clip, and you watch it. There’s no interaction. No agency. You can’t steer what happens. You can’t make decisions inside the generated world. The output is a movie, not a simulation.

This matters because the most interesting applications of video generation aren’t about making content. They’re about making environments. Game worlds, training simulations, robotics testing grounds, virtual spaces where the physics and visuals respond to input in real time. That requires something different from a text-to-video pipeline. It requires action-conditioned generation, where the model takes user inputs and produces the next frame based on what you did, not just what came before.

The research community has started calling these “world models,” and the early results are genuinely impressive. Several groups have built models that can generate interactive Minecraft environments. The problem is speed. Most of these models run at 2 to 5 frames per second, which is unusable for anything that needs to feel responsive. They also require enormous compute to train, which limits who can actually build them.

The Micro: A High School Dropout and a Universe Simulator

Rami Seid is a co-founder. His path is unusual. He dropped out of high school, got his first internship as a machine learning engineer at a robotics lab, went on to become CTO of a govtech contracting company, then co-founded a telecommunications company while doing ML work on the side. Alberto is the other co-founder. They’re a three-person team based in San Francisco, part of YC’s Winter 2025 batch.

Lucid is building what they call a universe simulator powered by interactive video models. The headline numbers are striking: their action-conditioned diffusion video model runs at over 20 frames per second on a single 4090 GPU. That’s five times faster than other Minecraft world models. And they trained it with 100 times less computational resources than comparable systems.

Those efficiency claims deserve unpacking. The world model research space has been dominated by labs with access to thousands of GPUs and millions of dollars in compute budget. If Lucid is getting competitive results with two orders of magnitude less training compute, that’s not just a cost savings. It changes who can build these things and how quickly they can iterate.

The 20 fps number matters for a practical reason. Below about 15 fps, interactive applications feel laggy and unresponsive. Most existing world models sit well below that threshold, which makes them interesting research demos but not usable products. Crossing 20 fps on a single consumer GPU means the output is something people can actually interact with in real time without specialized hardware.

The company’s website leans into philosophical language about consciousness and infinite worlds, which is the kind of thing that either signals deep ambition or decorative hand-waving. Given the technical claims, I’ll give them the benefit of the doubt.

The Verdict

Lucid is making a bet that interactive video generation is a different product category from passive video generation. I think they’re right. Runway and Pika are building tools for content creators. Lucid is building infrastructure for interactive experiences. Those are different markets with different technical requirements and different customers.

The efficiency angle is the most interesting part. If the 100x compute reduction holds up across more complex environments beyond Minecraft, Lucid could make world models accessible to game developers, simulation companies, and robotics teams that don’t have research lab budgets. That’s a much larger addressable market than “people who want to generate video clips.”

The risk is that the gap between a Minecraft world model and a general-purpose universe simulator is enormous. Minecraft is a constrained environment with simple geometry, discrete physics, and a limited visual vocabulary. Real-world simulation requires handling continuous physics, complex lighting, deformable objects, and the long tail of visual scenarios that don’t appear in any training set. Getting from “fast Minecraft model” to “fast everything model” is a research problem that could take years.

Thirty days, I’d want to see the model running on environments beyond Minecraft. Sixty days, the question is whether external developers can use the model to build their own interactive experiences, or whether it’s still a closed demo. Ninety days, I’d want to understand the business model. Is this an API play, a platform for game developers, a simulation tool for robotics companies, or something else entirely? The technical foundation looks strong. The path from foundation to product is the part that hasn’t been written yet.

Lucid Is Building a Universe Simulator, and Their Video Model Runs at 20 FPS

The Macro: Video Models Are Getting Fast, but Nobody Can Play Inside Them Yet

The Micro: A High School Dropout and a Universe Simulator

The Verdict

More on this