cllctd is a two-sided AI data marketplace headquartered in Dubai, sourcing from India. We exist because the world's AI models are missing 1.4 billion people.
Every major AI model in production today was trained almost exclusively on data from North America, Western Europe, and East Asia. India — with 1 in 5 people on earth — is a ghost in the training data.
This isn't a minor gap. It means speech models that fail for Hindi and Tamil speakers. It means robotics systems untrained on the environments where billions of workers operate. It means AI that works well for a minority of the world and poorly for everyone else.
cllctd is the marketplace that closes this gap. We connect India's hardware vendors, voice networks, and worker communities with the AI companies that need their data — while ensuring people who generate that data are compensated fairly.
Data that was collected. Not scraped. That's not a tagline. It's the entire business.
Every dataset on cllctd carries verified consent documentation. No grey areas. No "publicly available" rationalisations. If a person's data is being used, they agreed to it.
70% of every deal goes to the vendor. We take 30% to operate. This isn't charity — it's the only model that builds a lasting supply side worth having.
We go deep on specific verticals — egocentric worker data, regional voice, institutional archives — where margins are high and moats are real.
Our ADGM/DIFC structure provides clean IP for global buyers. Our India sourcing provides diversity, volume, and cost advantage no Western marketplace can replicate.
Hardware vendors, voice networks, worker communities, and institutional archives in India provide the raw material. Consent-verified, documented, submitted through our onboarding portal. We work with vendors to build compliant frameworks where needed.
We annotate, enrich, QA, and format raw vendor submissions into enterprise-grade datasets. Our legal team handles consent verification and rights documentation. Our sales team matches datasets with active buyer demand and manages deal flow end to end.
AI labs, robotics companies, and Gulf sovereign AI funds license datasets through our ADGM/DIFC entity. Clean IP, zero tax on royalties, and direct access to buyers like G42, MGX, and SDAIA who need India-origin data and trust UAE legal structures.
Landing page, vendor onboarding portal, and brand identity live. Pre-seed fundraise underway ($2M, $12M cap SAFE). First vendor conversations active.
Hardware vendor agreements signed. First annotated datasets in catalogue. First enterprise licensing deal closes. Revenue begins. ADGM entity established.
India voice recording networks onboarded across Hindi, Tamil, Telugu, Bengali. Catalogue expands to 20+ listed datasets. Gulf sovereign buyer pipeline activated.
Mobile app launches in India — iOS and Android. UPI payouts. Task-based data contribution campaigns. First 10,000 active contributors.
Series A raise ($6–8M target). Arabic-language data expansion for Gulf buyers. Automated annotation pipeline. 50+ enterprise buyers. Recurring subscription model.
Whether you're a potential vendor, buyer, investor, or just curious — we reply to everything personally.
Apply as a vendor or get in touch with our buyer team. We respond to everything personally.
Apply as Vendor →