News

Creators need stronger protections to combat an “evolving problem:” unauthorized AI training

YouTube has warned tech companies that some AI practices violate the platform’s terms of service, but that caution hasn’t stopped the unauthorized use of videos for the purpose of AI training. Top creators like Marques Brownlee have spoken out after their content was included in a widely-circulated data set dubbed “the Pile.”

According to an investigation from Proof News, tech companies like Apple, Nvidia, and Anthropic have employed the Pile for training purposes. YouTube content in the Pile includes subtitles from 173,536 YouTube videos, which originate from more than 48,000 channels.

Some of that YouTube content comes from sources — like the educational hub Khan Academy — that make sense as training material for generative AI models. In other cases, users of the Pile encountered content from top creators like MrBeast, Brownlee, and Jacksepticeye, even though those YouTube power players did not approve the use of their videos for AI training purposes.

Subscribe for daily Tubefilter Top Stories

Proof News has put together a tool that allows users to search through the material in the Pile.

The nonprofit EleutherAI put together the Pile to provide smaller AI operations with low-cost, readily available access to training material. Though the dataset was not compiled for big tech firms like Apple and Nvidia, those companies have used it regardless.

“Apple technically avoids ‘fault’ here because they’re not the ones scraping,” Brownlee wrote on X. “But this is going to be an evolving problem for a long time.”

(Update 7/18: Apple has issued a statement saying its OpenELM model, which was trained on YouTube videos, isn’t used to power any of its AI/machine learning tools, including Apple Intelligence.)

AI developers, media companies, and individual creators all seem to have different ideas about the materials that can or cannot be repurposed for training. Those squabbles have led to ongoing lawsuits

, several of which have targeted innovators like OpenAI. In response, the Microsoft-backed firm is building tools that give content owners more power over the ways their IP is used. But while creators wait for those controls to be put in place, they are left with few means to protect their videos against unauthorized reuse.

Some of the companies that are benefitting from the Pile have challenged the authority of YouTube’s terms of service. “The Pile includes a very small subset of YouTube subtitles,” an Anthropic spokesperson told WIRED. “YouTube’s terms cover direct use of its platform, which is distinct from use of The Pile dataset.”

The owners of content included in the dataset have different ideas. Dave Wiskus, the CEO of creator-led streamer Nebula, described the training practices of Pile users as “theft.” Julia Walsh, the CEO of Vlogbrothers-affiliated media company Complexly, expressed similar ideas. “We are frustrated to learn that our thoughtfully produced educational content has been used in this way without our consent,” she said.

Our opinion here at Tubefilter is that U.S.-based creators who unwittingly become AI training dummies deserve the same protections that are afforded to content owners in other regions. The E.U. recently passed a sweeping law that lays out specific regulations for the datasets that are fed to AIs. A similar law in the U.S. would clear up a lot of the confusion about whom — if anyone — is responsible for the rights of the creators found in the Pile.

Published by

Sam Gutelle

Tags: ai trainingeleutheraigenerative aiHomepage Featuremarques brownleemkbhdthe pile

2 years ago

Roblox hikes developer earnings by 42%–but only if they make games aimed at adults
Roblox is quadrupling down on chasing adult gamers--and rewarding developers who make games appealing to…
Have you heard? Ludwig’s ‘GeoGuessr’ fame, Poland’s record-setting stream, and an NFL prank gone wrong.
Each week, we handpick a selection of stories to give you a snapshot of trends,…
After FaZe Clan’s epic collapse, it’s CORE members are reuniting with a new creator group
Five months after FaZe Clan's collapse, some of its best-known alumni are looking to bring back…

Have you heard? Ludwig’s ‘GeoGuessr’ fame, Poland’s record-setting stream, and an NFL prank gone wrong.

Each week, we handpick a selection of stories to give you a snapshot of trends,…

2 days ago

Homepage Feature

Roblox hikes developer earnings by 42%–but only if they make games aimed at adults

Roblox is quadrupling down on chasing adult gamers--and rewarding developers who make games appealing to…

2 days ago

News

After FaZe Clan’s epic collapse, it’s CORE members are reuniting with a new creator group

Five months after FaZe Clan's collapse, some of its best-known alumni are looking to bring back…

2 days ago

News

TV production companies let creators use their game show formats. Then Squeezie flipped the script.

Creators have already made their mark in movie theaters and on Broadway stages. Now, they're…

2 days ago

Homepage Feature

Vine is back–and it has a zero-tolerance policy for creators using AI

Vine is back, and it's anti-AI. Jack Dorsey, co-founder and former multi-time CEO of Twitter,…

3 days ago

News

Spotify has a new use for “verified” check marks: They can identify human creators

On the internet, it's been a roller coaster ride for the humble check mark. At…

3 days ago

Creators need stronger protections to combat an “evolving problem:” unauthorized AI training

Subscribe for daily Tubefilter Top Stories

Related Post

Recent Posts

Have you heard? Ludwig’s ‘GeoGuessr’ fame, Poland’s record-setting stream, and an NFL prank gone wrong.

Roblox hikes developer earnings by 42%–but only if they make games aimed at adults

After FaZe Clan’s epic collapse, it’s CORE members are reuniting with a new creator group

TV production companies let creators use their game show formats. Then Squeezie flipped the script.

Vine is back–and it has a zero-tolerance policy for creators using AI

Spotify has a new use for “verified” check marks: They can identify human creators