Meta, Google, and OpenAI Flouted Ethics to Harvest Data for AI Models

An impending data shortage and intense competition has led three top AI contenders to mine user content and creative works without paying them.

People use their cell phones in a dark street during a blackout in Bauta municipality, Artemisa province, Cuba, on March 18, 2024. — Yamil Lage/Getty Images

AI models require massive amounts of data to get smarter—so much that they might run out of publicly-available data to consume by 2026, one research firm estimates. But faced with the impending shortage and amid intense competition, OpenAI, Google, and Meta have schemed for ethically and legally dubious ways to access more data, a New York Times investigation found. Their methods include scraping YouTube videos, Google Docs and Google Maps reviews, and other human-made content to skirt licensing fees, underscoring huge concerns about privacy and copyright that have already been raised by artists, writers, filmmakers, and other content creators whose works have been used without their knowledge or consent. Some employees at Google told the Times they knew about OpenAI’s ethically hazy data-scraping practices, but refused to report it because Google had engaged in some of the same habits. One employee at Meta said that in a meeting with its top executives about using copyrighted material for AI training, not a single person in the room raised concerns about harvesting people’s works without paying them.

Read it at New York Times

Meta, Google, and OpenAI Flouted Ethics to Harvest Data for AI Model Development: NYT

An impending data shortage and intense competition has led three top AI contenders to mine user content and creative works without paying them.

Amanda Yen