Meta Faces Legal Backlash Over Alleged Use of Copyrighted Books to Train Llama AI Models

Meta Platforms, the parent company of social media giants Facebook and Instagram, faces mounting backlash over allegations of using thousands of copyrighted books to train its cutting-edge AI model, Llama. The legal challenge, detailed in a recent court filing, sheds light on a controversial battle between the tech giant and renowned authors, including comedian Sarah Silverman and Pulitzer Prize winner Michael Chabon.

A copyright infringement lawsuit consolidated two legal actions initiated by Silverman, Chabon, and other prominent authors claiming that Meta used their creations without permission to train its AI language model. Filed on December 11, this new lawsuit included chat logs from a Meta-affiliated researcher. According to this evidence, Meta's legal team had reportedly cautioned against the imminent risk of employing pirated books for AI model training. In particular, researcher Tim Dettmers discussed the legality of utilizing book files as training data for Llama with Meta's legal department. However, the company allegedly proceeded with the questionable practice despite knowing the possible repercussions.

These logs, extracted from a Discord server, are proven pivotal evidence, suggesting that Meta was aware of potential legal risks associated with using copyrighted books.

The quoted chat logs in the complaint reveal Dettmers' conversation with Meta's legal team in 2021, where he discussed the challenges posed by the dataset known as The Pile. Dettmers expressed frustration, stating that many people on Facebook were interested in working with The Pile but could not use it legally. The logs further detail that Meta's lawyers had informed Dettmers that the data could not be used or models published if trained on that specific data.

While the precise legal concerns expressed by Meta's legal team are not fully disclosed, chat participants in the logs identify 'books with active copyrights' as a potential source of worry. They argued that training on such data should fall under the umbrella of fair use, a legal doctrine in the United States that allows certain unlicensed uses of copyrighted works. Dettmers, a doctoral student at the University of Washington, commented no further on these claims.

Meta Platforms is in a tough situation because it faces a huge legal fight at a time when many tech companies are also dealing with lawsuits containing similar complaints of copyright infringement. If the courts favor the complainants, it could change how AI works. Companies might have to pay content creators for using their works.

The ongoing legal challenge with Meta prompts the tech industry to observe and anticipate the potential repercussions that could reshape the landscape of generative AI and impact the strategies of major players in the field. The outcome may affect Meta and set important rules for what is considered right or wrong regarding AI training and development.