Building on the previous copyright lawsuit and a breakthrough judgment that revealed Anthropic was using copyrighted books for AI training, various global music publishers filed a new lawsuit against Anthropic, alleging it infringed on copyrighted content, including song lyrics, particularly through pirated means.
This is not the first time Anthropic has been dragged into copyright infringement cases. Earlier in September 2025, in the famous Anthropic vs. Bartz case, the company agreed to pay $1.5 billion in “the largest publicly reported copyright recovery” case. Interestingly, during the same court proceedings before the final settlement, the US court found that using purchased copyrighted works to train AI models is ‘fair use’ under the US Copyright Law.
Separately, several music publishers previously filed a separate copyright infringement case against Anthropic that the parties later resolved through an agreement. The fresh lawsuit, which also includes a few music publishers from the previous lawsuit, is a follow-up to the previously settled case.
Why did Music Publishers file a fresh copyright lawsuit?
In this new lawsuit, the music publishers say they tried to expand their earlier copyright case against Anthropic after Judge William Alsup’s rulings in the case, popularly known as Anthropic Vs Bartz, which revealed Anthropic’s illegal downloading from pirated shadow libraries.
“Until the revelations in those [Bartz vs Anthropic] opinions and filings, Publishers did not know that their works were being copied by Defendants from some of the most notorious pirated sources in the world,” reads the lawsuit, referring to the piracy shadow libraries like LibGen (Library Genesis) and its mirror sites such as Z-library. Before evolving as Anna’s Archive, Pirate Library Mirror (PiLiMi) replicated content from the banned Z-Library.
When music labels sought to amend their previous complaint, Anthropic opposed it, arguing that the torrenting claims were unrelated to their copyright infringement case and would “fundamentally transform” the first case. So, the publishers say that they filed this separate lawsuit to address what they call “willful infringement” through the downloading and uploading of unauthorised copies of their works from massive piracy websites.
Referring to the above-mentioned case, the publishers also noted they had already sued Anthropic in the earlier case over the alleged copying of their content to train certain Claude AI models. Despite their agreement to enforce safety guardrails to prevent their AI models from generating copyrighted content, they claim that Anthropic has continued to use their works on a much larger scale since then, leading to this second lawsuit over the same copyright infringement issues surrounding AI training and outputs.
What do the Music Publishers Want?
Concord Music Group, Universal Music, and others filed the lawsuit against Anthropic, its CEO, Dario Amodei, and co-founder Benjamin Mann. Along with allegations of copyright infringement at Anthropic, the lawsuit also claims that Amodei and Mann used the alleged pirated libraries while they were at OpenAI between 2019 and 2020, as revealed in the Bartz v. Anthropic court proceedings.
“From the very beginning, Anthropic has built its multibillion-dollar business on piracy,” states the lawsuit, referring to the founders of Anthropic’s alleged involvement with OpenAI.
In their new lawsuit, the music publishers asked the court to award statutory damages of up to $150,000 per infringed work. They also sought additional statutory damages of up to $25,000 per violation for the alleged removal or alteration of copyright management information from their original work.
“In total, Defendants torrented at least 5 million copies of pirated books from LibGen in 2021, and at least another 2 million copies of pirated books from PiLiMi in 2022,” alleges the lawsuit. Additionally, it claims that they illegally downloaded a separate catalogue of bibliographic metadata for each collection, which included information on book title, author, and ISBN, a numeric commercial book identifier.
The publishers further requested an order requiring Anthropic to destroy all infringing copies of their works in its datasets under the court’s supervision and to file a report on its compliance.
Why are they complaining?
Music publishers say the industry depends on licensing and authorised deals to ensure songwriters and publishers are paid when their works are used or played. Publishers license songs in their catalogues, collect the revenue, and share it with the artists they represent. They say that these licensing arrangements can also extend to AI companies that can bring them additional revenue in exchange for their proprietary data for AI training.
After initially suing them for copyright infringement, Universal Music Group and Udio, an AI music generator, announced a partnership following their settlement. Similarly, the recent Tips Music earnings call also revealed that its partnership with Warner Music Group involved AI training as part of NVIDIA’s new music model, which it is developing in collaboration with Universal Music Group.
Particularly regarding illegal downloading through peer-to-peer-based torrent networks, the lawsuit further said that by torrenting pirated books, Anthropic “violated Publishers’ exclusive right of reproduction.” Explaining this claim, it said, “to make matters worse, because of the two-way nature of the BitTorrent protocol, when Defendants downloaded copies of these pirated books via torrenting, they simultaneously uploaded to the public unauthorized copies of the same books, thereby infringing Publishers’ exclusive right of distribution in these works and contributing to further infringement of Publishers’ works as well.”
Allegations on Anthropic’s Torrenting:
“Despite its multibillion-dollar valuation, Anthropic refuses to pay a cent for the vast amounts of copyrighted content—including Publishers’ musical compositions—it takes without permission or credit to build its business,” claims the lawsuit, alleging the AI company of building a “central library by copying and ingesting text from the internet and other sources.”
The lawsuit states that Anthropic used BitTorrent to download and copy text from illegal pirate library websites. Torrenting is a peer-to-peer (P2P) file-sharing protocol that is “infamously used for widespread unauthorised reproduction and distribution of copyrighted materials,” as claimed by the lawsuit.
Citing disclosures from the Bartz vs. Anthropic proceedings, the lawsuit said CEO Dario Amodei acknowledged that Anthropic had many legal options to obtain copyrighted works for AI training, but deliberately chose to obtain them illegally via torrenting because it was reportedly “faster and free.” He allegedly described the legal route as a “practice/business slog.” The lawsuit refers to another instance to illustrate Anthropic’s approach to copyrighted content. When one Anthropic founder learned he could torrent additional copyrighted works from PiLiMi, he wrote to colleagues, “Just in time!” To which another employee allegedly replied, “zlibrary, my beloved.”
The publishers called LibGen and PiLiMi “two of the largest and most infamous” illegal libraries. “These pirate libraries contain every genre of book imaginable, including songbooks, sheet music collections, and other books of song lyrics, containing copyrighted musical compositions owned and controlled by Publishers and others,“ states the complaint.
In addition to these allegations, internal documents unveiled at the recent legal filings revealed Project Panama is Anthropic’s “effort to destructively scan all the books in the world”, the Washington Post reported. It also revealed that Anthropic had “spent tens of millions of dollars to acquire and slice the spines off millions of books.”
Voluntary Deletions and Tuning Models to Exploit Copyright Content:
After Anthropic collects the vast text library that includes publishers’ copyrighted works, it allegedly uses portions of that data to train its AI models through further unauthorised copying of the same. Publishers alleged that Anthropic “cleans” its training text but leaves infringing content, such as song lyrics, while using tools to remove copyright notices and other copyright management information that generally identify the copyright holders.
They also said Anthropic copies and processes the corpus in memory, breaking it into “tokens” for storage, and makes additional copies during finetuning and reinforcement learning based on human and AI feedback. As part of that process, publishers claimed Anthropic-directed human reviewers prompt and reward the model in ways that can involve outputs tied to publishers’ lyrics.
The lawsuit said that as early as May 2021, senior Anthropic employees, including founders Benjamin Mann and Jared Kaplan, discussed using extraction tools to strip webpage footers, where copyright notices often appear, from training data.
In June 2021, they concluded that one tool, jusText, a Python-based boilerplate content removing tool, left too much “useless junk,” including copyright notice information, compared with alternatives like Readability and Newspaper. Mann also said he wanted the model to “ignore the boilerplate.” Publishers alleged that Anthropic chose another extraction tool, Newspaper3k, because it reportedly removed copyright owner names and notices more effectively.
“Because Newspaper [tool] removed Copyright Management Information more effectively, Anthropic purposefully decided to employ that tool to remove copyright notices and other Copyright Management Information from Publishers’ lyrics and other copyrighted works,” read the lawsuit.
Which datasets did Anthropic Pirate?
“The datasets Anthropic has copied and filtered to train its Claude AI models include a well-known dataset called ‘The Pile,’ which includes countless unauthorised copies of Publishers’ lyrics,” claims the lawsuit. The now-deleted dataset, the Pile dataset, was also allegedly used by GPU chip-making company, NVIDIA.
The publishers claim that Anthropic continues to use The Pile to train its latest Claude models, which is drawn from several existing text sources, giving more weight to what it calls “high-quality datasets.”
These include Books3, a collection of hundreds of thousands of pirated books that allegedly contains many works with publishers’ musical compositions, as well as the YouTube Subtitles dataset of human-written closed captions. For additional context, Books3 is the same dataset allegedly used by NVIDIA, Apple, Adobe, Meta, and Anthropic itself. In August 2023, after a legal complaint by a Denmark-based anti-piracy group, Rights Alliance, the Books3 dataset was removed from The Pile, which was also removed in the same year.
The new copyright lawsuit also said Anthropic uses the “Common Crawl” dataset for ongoing AI training. Publishers alleged that the dataset includes a large number of their copyrighted lyrics, scraped without permission from authorised sites such as MusixMatch, LyricFind and Genius. Common Crawl refers to the non-profit company that “maintains a free, open repository of web crawl data that can be used by anyone.”
Also Read:
Support our journalism by subscribing
For YouSource link

