Onyx Is the Open-Source AI Assistant That Knows Where Your Company Buried the Docs

The Macro: Enterprise Search Has Been Broken for Twenty Years

I worked at a company once where finding the employee handbook required asking three people, checking two Slack channels, and eventually discovering it was in a Google Drive folder shared with someone who had left the company eight months earlier. This is not unusual. This is the default state of organizational knowledge at every company I have ever worked at or consulted for.

The enterprise search market has been trying to fix this problem since the early 2000s. Elasticsearch gave companies the ability to index their own data. Algolia made search fast and relevant for customer-facing products. But internal knowledge search has remained stubbornly bad. Confluence search is a running joke among developers. SharePoint search is worse. Slack search works for recent messages but falls apart for anything older than a few months.

The current wave of AI-powered enterprise knowledge tools is the most promising attempt yet. Glean raised over $200 million to build AI-powered work search. Guru, Tettra, and Slite all offer some version of a “knowledge base with AI.” The premise is simple: connect to the tools your company already uses, index everything, and let people ask questions in natural language instead of constructing search queries.

The problem with most of these solutions is that they are closed-source SaaS products asking you to hand over your most sensitive company data. For a lot of organizations, especially in regulated industries, that is a non-starter. This is where Onyx comes in with an approach that changes the equation.

The Micro: Your Most Knowledgeable Coworker, Open Source

Onyx is an open-source AI assistant that connects to your company’s documents, applications, and people. It ingests and synchronizes data from Google Drive, Slack, GitHub, Confluence, Salesforce, and other sources, then lets you ask questions through a chat interface that returns answers grounded in your actual company data. The project has 17,000 stars on GitHub, which for an enterprise tool is a serious signal of developer interest.

The founders are Yuhong Sun and Chris Weaver. Yuhong comes from Alation, where he worked on transformer architectures for natural-language-to-SQL and hybrid search. That background is directly relevant because Onyx uses a hybrid search approach combining BM25 (traditional keyword matching) with prefix-aware embedding models and contrastive learning. Chris was previously a technical lead at Robinhood. They came through Y Combinator’s Winter 2024 batch and are based in San Francisco with a team of about 20 people.

The technical architecture is what sets Onyx apart from competitors like Glean or Guru. The hybrid search system means you get the reliability of keyword matching combined with the semantic understanding of embeddings. If you search for “vacation policy” it will find the document even if it is titled “PTO Guidelines.” But it will also find exact matches when you search for a specific error code or project name, which is where pure semantic search often fails.

The platform supports plug-and-play connectors that sync in real-time with fine-grained access controls. That last part matters a lot. If an engineer should not have access to HR documents in Google Drive, they should not be able to find those documents through Onyx either. Permission-aware search is one of those features that sounds simple but is extremely difficult to implement correctly across multiple source systems.

Onyx also integrates directly with Slack to auto-answer cross-team questions. Someone posts a question in a channel, Onyx suggests an answer based on existing documentation. If the answer is good, it saves everyone the round-trip of waiting for the right person to see the message and respond. If it is wrong, the human can correct it and Onyx learns.

The deployment options are flexible: self-hosted on your own infrastructure or cloud-based through Onyx directly. You can also bring your own LLM, including open-source models, which means organizations with strict data residency requirements can run the entire stack on-premise without any data leaving their network.

The Verdict

I think Onyx is one of the strongest open-source AI projects in the enterprise space right now. The combination of hybrid search, permission-aware connectors, self-hosting capability, and a 17K-star GitHub community puts them in a different category than the typical YC startup. This is a real product with real adoption, not a demo with a waitlist.

The risk is the same risk every open-source company faces: converting free users into paying customers. Glean does not have this problem because their product is closed source and enterprise-sold from day one. Onyx needs to build a business model around a product that anyone can deploy for free. The playbook exists, companies like GitLab, Elastic, and Confluent have all done it, but it requires a level of sales and marketing sophistication that is very different from the engineering excellence that got them to 17,000 stars.

In 30 days, I would want to see the paid offering clearly differentiated from the open-source version. At 60 days, the question is whether enterprise pilots are converting to contracts. At 90 days, if they can show that self-hosted deployments are leading to cloud migrations (the classic open-source funnel), the business model is working. The product is already good. The question is whether the company can be as well-built as the code.

Onyx Is the Open-Source AI Assistant That Knows Where Your Company Buried the Docs

The Macro: Enterprise Search Has Been Broken for Twenty Years

The Micro: Your Most Knowledgeable Coworker, Open Source

The Verdict

More on this