cllctd is a two-sided AI data marketplace headquartered in Dubai, sourcing from India. We exist because the world's AI models are missing 1.4 billion people.
Every major AI model in production today was trained almost exclusively on data from North America, Western Europe, and East Asia. India — with 1 in 5 people on earth — is a ghost in the training data.
This isn't a minor gap. It means speech models that fail for Hindi and Tamil speakers. It means robotics systems untrained on the environments where billions of workers operate. It means AI that works well for a minority of the world and poorly for everyone else.
cllctd is the marketplace that closes this gap. We connect India's contributors, voice networks, and worker communities with the AI companies that need their data — while ensuring people who generate that data are compensated fairly.
Data that was collected. Not scraped. That's not a tagline. It's the entire business.
Every dataset on cllctd carries verified consent documentation. No grey areas. No "publicly available" rationalisations. If a person's data is being used, they agreed to it.
Contributors are compensated fairly — the only model that builds a lasting supply side.
We go deep on specific verticals — egocentric worker data, regional voice, institutional archives — where margins are high and moats are real.
Our SHAMS structure provides clean IP for global buyers. Our India sourcing provides diversity, volume, and cost advantage no Western marketplace can replicate.
Contributors, voice networks, worker communities, and institutional archives in India provide the raw material. Consent-verified, documented, submitted through our onboarding portal. We work with vendors to build compliant frameworks where needed.
We annotate, enrich, QA, and format raw vendor submissions into enterprise-grade datasets. Our legal team handles consent verification and rights documentation. Our sales team matches datasets with active buyer demand and manages deal flow end to end.
AI labs, robotics companies, and leading Gulf AI institutions license datasets through our SHAMS entity. Clean IP, zero tax on royalties, and direct access to buyers like leading Gulf AI institutions that need India-origin data and trust UAE legal structures.
Landing page, contributor app, and brand identity live. Pre-seed fundraise underway . First contributor and buyer conversations active.
Contributor agreements signed. First annotated datasets in catalogue. First enterprise licensing deal closes. Revenue begins. SHAMS entity established.
India voice recording networks onboarded across Hindi, Tamil, Telugu, Bengali. Catalogue expands to 20+ listed datasets. Gulf sovereign buyer pipeline activated.
Mobile app launches in India — iOS and Android. UPI payouts. Task-based data contribution campaigns. First 10,000 active contributors.
Series A raise. Arabic-language data expansion for Gulf buyers. Automated annotation pipeline. 50+ enterprise buyers. Recurring subscription model.
Whether you're a potential vendor, buyer, investor, or just curious — we reply to everything personally.
Sign up as a cllctr or get in touch with our buyer team. We respond to everything personally.
Join cllctrs →