The reckoning over companies like OpenAI using creators’ content without their permission might finally be at hand.
Since OpenAI dropped the first version of ChatGPT in November 2022, it’s faced scrutiny over what data it uses to make its generative AI products, with many creators concerned their videos have been scraped and dumped into the slush. Creators understandably want to keep control of the content they make, and while it’s too late to extract their videos from already-scraped datasets, it may be possible for them to receive compensation for the violation of their ownership rights–and set a precedent that could prevent other companies from taking what’s theirs in the future.
That’s the goal of a lawsuit from David Millette, a Massachusetts man who’s now opened a class action lawsuit against OpenAI seeking $5 million in damages for himself and other creators.
Millette, who’s had a YouTube account since 2009, alleges OpenAI has engaged in the “surreptitious, non-consensual transcription of millions of YouTube users’ videos […] to train Defendants’ AI software products,” and that it’s “profited significantly” by doing so. The suit specifically refers to allegations that OpenAI created a speech recognition model, Whisper, to transcribe audio, then used Whisper to transcribe millions of hours of YouTube content. Those transcriptions were reportedly used to train GPT-4.
The lawsuit alleges that by scraping creators’ videos, OpenAI violated copyright law, since creators retain ownership rights to any videos they upload thanks to YouTube’s terms of service.
“Much of the material in OpenAI’s training datasets […] comes from works–including videos created and uploaded by Plaintiff–that were copied by OpenAI without consent, without credit, and without compensation,” the suit alleges.
As TechCrunch points out, the reason makers of large language models (LLMs) like ChatGPT have turned to using video transcriptions for training is because they’ve already scraped everything they can from the rest of the internet, and because more and more text-based websites are now installing blockers to keep future scrapes from happening. Over 35% of the world’s top 1,000 websites have those protections in place.
If you’re wondering whether YouTube is looking into solutions like that to prevent external scrapings, we’re not sure. But there’s a bigger concern with YouTube: it talks a big game about keeping creators safe in the advent of genAI, but it’s allegedly also scraping transcriptions of creators’ videos and using them to train Google‘s own AI products.
Millette’s lawsuit is a civil case, but it does ask the presiding judge to state that OpenAI violated copyright laws, something that could expose the company to future criminal charges. And, like we mentioned above, Millette’s also seeking $5 million in damages–which, since this is a class action lawsuit, would be split between him and any other affected creators in the event that things are decided in his favor.
If Millette wins his case, creators may receive some cash. But their data will still be part of potentially dozens of genAI products because there are no established protections for creators against having their videos, writing, art, and more scraped and subsumed into training sets. Until those protections are in place, creators like Millette have to fight for themselves, and hope judgments in their favor will deter companies who want to use their content without permission.
Each week, we handpick a selection of stories to give you a snapshot of trends,…
Roblox is quadrupling down on chasing adult gamers--and rewarding developers who make games appealing to…
Five months after FaZe Clan's collapse, some of its best-known alumni are looking to bring back…
Creators have already made their mark in movie theaters and on Broadway stages. Now, they're…
Vine is back, and it's anti-AI. Jack Dorsey, co-founder and former multi-time CEO of Twitter,…
On the internet, it's been a roller coaster ride for the humble check mark. At…