A quiet industry is growing in the gaps of the internet scrapers can't reach: people selling recordings of their walks, their voices, and their private conversations to train AI.

The Data Wall

The most widely used AI training sources โ€” C4, RefinedWeb, and Dolma, which together account for roughly a quarter of the web's highest-quality datasets โ€” have moved to restrict access for generative AI training. Researchers estimate the industry will exhaust fresh, high-quality text to train on as soon as 2026. Some labs have turned to synthetic data, but feeding a model its own outputs causes quality degradation over time.

The Gig Economy Filling the Gap

Apps like Kled AI, Silencio, and Neon Mobile pay people directly for their data. In Cape Town, a 27-year-old earned $14 for a ten-minute neighborhood walk recorded on his phone โ€” about half a week's groceries at local wages. In Ranchi, India, a student earns over $100 a month letting an app passively record ambient city sounds through his phone's microphone. In Chicago, an 18-year-old sold private phone chats with friends and family to a conversational AI platform for $0.50 per minute.

YC-backed Luel AI pays $0.15 per minute for multilingual conversations. ElevenLabs lets people license their voice for cloning at $0.02 per minute.

The Trade-Off

The economics are stark but short-sighted. Workers helping train AI systems may be contributing to the automation of their own future skills. And once a voice or face is in a training set, there's no clear path to revoke it. Researchers and privacy advocates warn that the gig trainers most willing to sell their data cheaply are often the ones with the least legal recourse if it's misused.