The Macro: The Training Data Problem Is Getting Worse
Every frontier AI company needs training data. Enormous amounts of it. Video, audio, images, text. The demand is growing faster than the supply of ethically sourced, legally clean data. And the lawsuits are piling up. The New York Times sued over its articles being used for training. Getty Images sued over its photos. Voice actors, musicians, and writers are all fighting over whether their work can be used to train models without permission or compensation.
The current state of AI training data is a mess. Some companies scrape the web and deal with the legal consequences later. Some license data from a handful of large providers like Shutterstock or Reddit. Some use synthetic data, which works for certain tasks but introduces its own quality problems. What almost nobody is doing well is going directly to individual creators and saying “we will pay you for your data, with your explicit consent, and clear rights attached.”
The rights-cleared data market is small but growing. Scale AI has built a large business around data labeling and curation. Appen does similar work. Defined.ai focuses on ethical AI data. But most of these companies focus on annotation and labeling, not on sourcing raw content from individual creators. The gap is the marketplace layer, a place where a person with a camera or a microphone can upload their content and get paid for it, while AI companies get exactly the kind of bespoke data they need with clean legal provenance.
The Micro: A Data Marketplace That Pays in Days, Not Months
Luel is a two-sided marketplace for rights-cleared multimodal training data. On one side, content creators upload video, audio, and images. On the other side, AI companies browse and license curated datasets. The company was founded by William Namgyal and Inigo Lenderking, both Berkeley dropouts, and went through Y Combinator’s W26 batch.
The contributor side is straightforward. You upload your content, it gets curated to professional standards, and you get paid. Luel claims payouts happen in 2 to 7 days via Venmo, Stripe, and other payment methods. That speed is notable. Most data licensing deals take weeks or months to close. If you are a content creator, getting paid within a week is a much more compelling proposition than waiting for a quarterly royalty check.
The enterprise side offers both off-the-shelf datasets and custom campaigns with dedicated account management. The data categories include video content like tutorials and POV footage, audio and voice datasets for speech recognition and text-to-speech training, and images for computer vision applications.
The customer logos on their site include xAI, Mercor, DoorDash, and Copilot. If those are real paying customers, that is strong validation for an early-stage company. These are organizations with serious data needs and the budgets to pay for quality sourcing.
The quality control question is the most important one for this kind of marketplace. Garbage data produces garbage models. Luel claims professional standards with human curation, but the details of what that curation process looks like, and how it scales, will determine whether the data is actually useful for training frontier models or just technically “rights-cleared” without being technically good.
The pricing model is not transparent from the site, but marketplace economics usually work on a take rate. Luel probably keeps a percentage of each transaction, paying the creator most of the licensing fee while retaining a margin. The question is whether that margin is high enough to build a business while keeping prices competitive with less ethical data sources. Because the uncomfortable truth is that scraped data is effectively free. Rights-cleared data has to be good enough, or convenient enough, or legally safe enough to justify the cost premium.
The competitive advantage Luel is building is the contributor network. If they can attract hundreds of thousands of creators uploading diverse, high-quality content, the dataset catalog becomes difficult to replicate. This is a classic marketplace network effect. More creators attract more buyers, which attracts more creators.
The Verdict
The thesis is right. AI companies need clean data. Creators deserve to be paid. A marketplace connecting the two is a natural product. The execution challenge is building both sides simultaneously.
At 30 days, I would want to see contributor growth rates and average earnings per contributor. If creators are making meaningful money, they will stay and recruit others. If the payouts are tiny, the supply side will dry up.
At 60 days, the enterprise side matters more. Are AI companies actually using this data for production training runs, or just for experimentation? Production use means recurring revenue. Experimentation means one-time purchases.
At 90 days, I would be looking at data quality metrics. Are the models trained on Luel data performing comparably to models trained on scraped data? If the answer is yes, the ethical premium sells itself. If the answer is “it depends,” the sales conversation gets harder.
The lawsuits are not going away. The regulatory pressure is increasing. The companies that build legitimate data supply chains now will have a significant advantage when the rules tighten further. Luel is betting on that future, and I think the bet is sound.