Meta, Google, and OpenAI Flouted Ethics to Harvest Data for AI Model Development: NYT
CONTENT CRUNCH
AI models require massive amounts of data to get smarter—so much that they might run out of publicly-available data to consume by 2026, one research firm estimates. But faced with the impending shortage and amid intense competition, OpenAI, Google, and Meta have schemed for ethically and legally dubious ways to access more data, a New York Times investigation found. Their methods include scraping YouTube videos, Google Docs and Google Maps reviews, and other human-made content to skirt licensing fees, underscoring huge concerns about privacy and copyright that have already been raised by artists, writers, filmmakers, and other content creators whose works have been used without their knowledge or consent. Some employees at Google told the Times they knew about OpenAI’s ethically hazy data-scraping practices, but refused to report it because Google had engaged in some of the same habits. One employee at Meta said that in a meeting with its top executives about using copyrighted material for AI training, not a single person in the room raised concerns about harvesting people’s works without paying them.