Moss Runs Semantic Search in Under 10 Milliseconds Without a Server, and 500 Teams Are Already Using It

AI Developer ToolsSearch InfrastructureConversational AI

The Macro: Retrieval Is the Bottleneck Nobody Wants to Talk About

I keep seeing the same pattern in conversational AI products. The language model is fast. The text-to-speech is fast. The speech-to-text is fast. And then there is this 200-millisecond pause while the system fetches context from a vector database, and the whole illusion of real-time conversation falls apart. Two hundred milliseconds does not sound like much until you are on a phone call with an AI agent and it pauses awkwardly after every question like it forgot what you were talking about.

The retrieval-augmented generation (RAG) stack has become standard for anything beyond simple chatbots. If your AI agent needs to know about your company’s products, policies, or customer history, it needs to search through a knowledge base on every turn of the conversation. That search currently goes through a vector database like Pinecone, Weaviate, or Qdrant, which means a network round trip to a hosted service. In the best case, that adds 50 to 100 milliseconds. In the real world, with cold starts, network jitter, and index size, it regularly exceeds 200 milliseconds.

For text chatbots, 200 milliseconds is annoying but tolerable. For voice agents, it is a dealbreaker. Human conversation has a natural turn-taking rhythm. Delays beyond about 300 milliseconds feel unnatural. When you stack retrieval latency on top of LLM inference time and speech synthesis, the total response time pushes past the threshold where the conversation feels like talking to a machine instead of talking to a person.

Pinecone has raised over $200 million. Weaviate raised $50 million. Qdrant, Chroma, Milvus, the vector database category is well-funded and crowded. But every single one of these products is a hosted service that adds network latency to every query. The architectural assumption is that search happens on a server. Moss challenges that assumption entirely.

The Micro: Rust, WebAssembly, and the Bet That Search Should Run Where Agents Run

Sri Raghu Malireddi (CEO) and Harsha Nalluru (CTO) founded Moss and came through Y Combinator’s Fall 2025 batch. Their core insight is simple and, I think, correct: semantic search should run in the same environment as the agent that needs it. Not on a remote server. Not behind an API. Right there, in the browser, on the edge device, on the mobile app, wherever the AI agent lives.

They built the search runtime in Rust and compiled it to WebAssembly. That combination gives them near-native performance in any environment that supports WASM, which at this point is essentially everywhere. Browsers, Node.js, Python, mobile, edge servers. The vector index ships as a compact artifact that gets distributed to wherever it needs to run, and queries execute locally with zero network overhead.

The result is sub-10 millisecond semantic search. Not “sub-10 milliseconds on a benchmark under ideal conditions.” Sub-10 milliseconds in production, in a browser, on real data. That is 20 to 50 times faster than a round trip to Pinecone. For voice agents, that difference is the gap between a natural conversation and an awkward one.

The product includes JavaScript and Python SDKs that drop into existing projects. The managed data layer handles index distribution and updates, so when the underlying knowledge base changes, the local indexes sync automatically. Hybrid search combines semantic matching with traditional keyword matching, which matters because pure vector search still misses exact-match queries that keyword search handles perfectly.

The traction is strong. Over 500 teams are using the platform. Three paying customers are in production. Six enterprise design partners are actively working with the team. Seven additional companies are evaluating. Revenue and usage have been growing roughly 100 percent week over week. The customer list includes Grammarly and HubSpot, which are not small players and not companies that adopt infrastructure tooling casually.

Pricing starts with a free developer tier at $5 per month in credits, a hobbyist plan at $30 per month, and a startup plan at $200 per month with cloud search and 150 concurrent sessions. Enterprise pricing is custom. That is a reasonable ladder for a developer tools product, low enough to experiment with and structured to grow with usage.

The privacy angle is worth noting. Because search runs locally, sensitive data never leaves the client environment. For healthcare, finance, and enterprise customers with strict data residency requirements, this is not a nice-to-have. It is a hard requirement that hosted vector databases struggle to meet.

The Verdict

I think Moss is building in the right gap at the right time. The conversational AI market is exploding, voice agents are the next wave, and retrieval latency is a genuine architectural problem that the current vector database incumbents are not solving because their business model depends on hosted infrastructure. Moss inverts the architecture and the latency numbers prove the approach works.

The risk is that the incumbents adapt. Pinecone could ship an edge runtime. Weaviate could compile to WASM. The concept of local-first search is not patentable. But the implementation is hard, and Moss has a meaningful head start. The Rust and WASM stack is not something you bolt onto an existing Python-based vector database in a quarter.

At 30 days, I want to see how index size scales. Sub-10 milliseconds on a small knowledge base is impressive. Sub-10 milliseconds on a million-document corpus running in a browser is a different story. At 60 days, the question is whether the 100 percent week-over-week growth sustains. That kind of growth rate either means genuine product-market fit or a spike that plateaus. At 90 days, I want to know if Moss is becoming the default retrieval layer for voice agent platforms. If Vapi, Retell, Bland, or any of the major voice AI companies integrate Moss as their search backend, this stops being a developer tool and starts being infrastructure. That is where the real value lives.

Visit Official Site →

← Back to September 22, 2026 edition

Moss Runs Semantic Search in Under 10 Milliseconds Without a Server, and 500 Teams Are Already Using It

The Macro: Retrieval Is the Bottleneck Nobody Wants to Talk About

The Micro: Rust, WebAssembly, and the Bet That Search Should Run Where Agents Run

The Verdict

More on this