Meta Faces Criticism for Employing Copyrighted Books in AI Training Despite Cautionary Advice from Its Legal Team


Despite warnings from Meta’s legal team regarding the legal risks associated with utilizing thousands of pirated books to train its AI models, the company proceeded with the practice, as indicated in a recent filing for a copyright infringement lawsuit.

Meta Platforms, formerly known as Facebook, is entangled in escalating legal turbulence as it faces allegations of using thousands of pirated books to train its AI models, despite explicit warnings from its legal team. This contentious situation, outlined in a recent court filing associated with a copyright infringement lawsuit, underscores an intense conflict between renowned authors and the tech giant.

Notable figures such as comedian Sarah Silverman and Pulitzer Prize winner Michael Chabon are part of a collective effort against Meta, asserting that their works were illicitly employed by the company to train its artificial intelligence language model, Llama. The latest legal submission consolidates these claims, shedding light on Meta’s alleged disregard for copyright permissions in its pursuit of advancing AI technology.

The filing introduces chat logs from a Meta-affiliated researcher discussing the acquisition of the dataset in a Discord server. These logs potentially serve as evidence of Meta’s awareness of potential legal infringement related to the use of the book files.

According to a Reuters report, the cited conversation within the complaint reveals exchanges between researcher Tim Dettmers and Meta’s legal department, where concerns about the legality of using the book files for training purposes were raised. Dettmers’ communications expose internal debates within Meta regarding the permissibility of employing the dataset, highlighting the company’s apparent acknowledgment of legal uncertainties surrounding the matter.

While the specific details of the lawyers’ concerns remain undisclosed, references to “books with active copyrights” emerge as a primary source of apprehension. Participants in the chat suggest that training on such data could potentially infringe upon fair use, a legal doctrine protecting specific unlicensed uses of copyrighted works.

The release of Meta’s Llama large language model earlier this year, purportedly trained on the controversial dataset, has generated controversy within the content creator community. As tech companies face a wave of lawsuits alleging unauthorized use of copyrighted material for AI advancements, the outcomes of these legal battles could significantly shape the future landscape of generative AI.

In February, Meta unveiled the first version of its Llama large language model, disclosing a roster of datasets used during its training phase, including “the Books 3 section of The Pile,” a dataset reportedly comprising 196,640 books, as confirmed by claims in the legal filing. However, Meta refrained from revealing specifics about the training data used for its subsequent release, Llama 2, which became commercially available during the summer months and is accessible to enterprises with fewer than 700 million monthly active users at no cost.

Leave a Reply

Your email address will not be published. Required fields are marked *