Authors are suing Meta for allegedly using their works to train its Llama artificial intelligence software, according to a class action lawsuit filed on Tuesday. Pulitzer Prize-winning author Michael Chabon is among the plaintiffs listed in the lawsuit, which accuses Meta of infringing on their intellectual property.
Meta released its Llama AI language tool in February, and since then, the authors claim in the lawsuit that the company used their text to train its dataset. AI tools adjust the output of information based on the text they’ve been exposed to in the past, and the authors claim: “Much of the material in Meta’s training dataset, however, comes from copyrighted works—including works written by Plaintiffs—that were copied by Meta without consent, without credit, and without compensation.”
According to the lawsuit, Meta said in a table describing the dataset used to train its AI model that 85 gigabytes of the training data came from a “Books” category. Under this category, Meta reportedly collates books from Project Gutenberg, an online archive consisting of roughly 70,000 books that are no longer under copyright, and a Books3 section of ThePile.
Although Meta did not describe the contents of ThePile, the lawsuit claims according to information provided elsewhere, it is compiled by Bibliotik private tracker, which is a “shadow library,” meaning the materials are available through torrent systems. Shadow libraries are “flagrantly illegal,” the lawsuit says.
A significant number of Chabon’s works are available on Books3, the lawsuit says as well as numerous written works still under copyright protection from the other authors in the class action lawsuit. Other plaintiffs include playwright and Grammy Award winner David Henry Hwang, author Matthew Klam, author and Grammy Award and Golden Globe nominee Ayelet Waldman, and author Rachel Louise Snyder.
The lawsuit states: “Plaintiffs and Class members did not consent to the use of their copyrighted books as training materials for LLaMA.” It adds: “Nevertheless, their copyrighted protected works were copied and ingested as part of training LLaMA. Plaintiffs’ copyrighted books appear in the dataset that Meta has admitted to using to train LLaMA.”
Meta released its latest version, Llama 2 in collaboration with Microsoft in July saying it is free for research and commercial use for companies with less than 700 million active users.
On Friday, the same authors also filed a class action lawsuit against ChatGPT maker, OpenAI, citing similar allegations. They claim in the suit that OpenAI copied their works to teach their AI system how to respond to prompts requested by users.
Lawsuits have also been filed against other companies including Microsoft and Stability AI which were likewise accused of using copyrighted material to train their AI tools. The companies have claimed they do not infringe on copyright laws because since AI allegedly transformed the original work, they claim it qualifies as fair use.
“The AI models are basically learning from all of the information that’s out there. It’s akin to a student going and reading books in a library and then learning how to write and read,” Kent Walker, Google’s president of global affairs, told The Washington Post. “At the same time, you have to make sure that you’re not reproducing other people’s works and doing things that would be violations of copyright,” he added.
However, the authors argue in both lawsuits that they were injured by the companies allegedly using their copyrighted material for their AI dataset efforts. The lawsuit filed against Meta said the authors involved in the class action suit “are entitled to statutory damages, actual damages, restitution of profits, and other remedies provided by law.”
Meta and OpenAI did not immediately respond to Gizmodo’s request for comment.