A U.S. district judge in the Anthropic AI copyright settlement clarifies the concept of fair use when it comes to AI training. In essence, AI developers may train AI on books that they lawfully acquire. The infringing act, however, is the unlawful acquisition of copyrighted works and the subsequent training of AI models. While the $1.5 billion settlement made headlines, the truly interesting aspect of Bartz v Anthropic is its implications for Generative AI and the copyright situation.

Anthropic AI copyright settlement

Several authors filed a class action against Anthropic in August 2024. They allege that Anthropic trained its Claude AI model on millions of books without permission. Some of which were obtained via pirate libraries like LibGen and PiLiMi. They argue that this is a large-scale copyright infringement, not protected by fair use.

Fair use is a legal doctrine that promotes freedom of expression by permitting the unlicensed use of copyright-protected works in certain circumstances.

Anthropic’s counterargument is that using books to train a language model was transformative. This is because the model does not reproduce or distribute the works. Instead, it learns statistical patterns to generate original language. In a summary judgment on 23 June 2025, the US district court ruled on fair use in AI training. The court held that training on lawfully acquired works by scanning books for computational analysis was fair use. However, using pirated copies was not fair use, and could proceed to trial. This decision led to Anthropic and the Authors reaching a settlement of $1.5bn. This settlement is the biggest copyright settlement in US history to date.

Fair use and AI training

Section 107 of the U.S. Copyright Act outlines the factors that courts consider when determining fair use. The court in the Anthropic AI copyright settlement applied these factors to AI training, demonstrating how Section 107 guides fair use analysis in the context of AI and highlighting transformative use and limited market harm as key considerations.

Purpose and character of the use

In general, courts examine how the party claiming fair use is utilising the copyrighted work and whether they are using the work in a transformative manner. For work to be transformative, it must add something new, be used for a different purpose and have a different character than the original work.

The court in the Anthropic case held that Anthropic was merely extracting non-copyrightable information by using the books to train the Claude AI model. This is because there is a shift in format, from print to digital, and the creation of the new technology (Claude) goes beyond mere copying. The use is transformative because it enables new functionalities, such as storage, searchability, and generative use, that the original works do not provide. However, the court highlighted that the use of pirated copies cannot be retroactively sanctified by a transformative purpose. This mirrors reasoning in Authors Guild v. Google, where Google’s digitisation of books for search indexing was deemed transformative, because the searchable index did not provide a substantial substitute for the copyrighted works but instead allowed increased access to information through snippets displayed in search results.

Nature of the work

Typically, using creative or imaginative works, such as novels, movies, or songs, is not considered fair use. In contrast, using factual works, such as technical articles or news items, is considered fair use.

In the Anthropic AI copyright settlement, the court stated that training AI models adds new purpose and meaning to the original works. While recognising that the copyrighted books were creative and received strong protection, the court found this factor to be only marginally relevant because Anthropic’s computational use was highly transformative.

Amount and substantiality

Under this factor, courts examine whether the use of copyrighted works constitutes a substantial portion of the copyrighted work. Courts are more likely to find fair use when a small amount of copyrighted material is used.

Anthropic used the entire works (books) of authors to train its Claude AI model. This aspect would normally weigh against the assertion of fair use, as held in Harper Row v. Nation Enterprises. However, the court in the Anthropic AI copyright case found that copying entire books was reasonably necessary to train AI language models effectively. The court emphasised that the key consideration was not just the quantity copied, but that the AI outputs did not traceably reproduce substantial parts of the original works, the amount copied was technically necessary and reasonable for training large models.

Market effect

Here, courts consider whether the use, for example, displaces sales of the original or whether it could cause substantial harm if it were to become widespread.

The court in this case rejected the plaintiffs’ assertions that AI models undercut their licensing market. This is because there was no evidence that Claude’s training or outputs had any impact on book sales. The judge stated that the Copyright Act protects existing markets, not hypothetical future licensing schemes. However, the court found that downloading pirated books clearly harmed the authorised market, tipping this factor against fair use in those instances, resulting in the settlement for the authors involved.

Implications of the Anthropic AI copyright settlement for AI developers

This case sheds light on how similar ongoing cases, such as Authors Guild v. OpenAI, may be decided. Since AI models rely heavily on data, developers must access data lawfully. The Anthropic case brings legal certainty by recognising that training AI models with lawfully acquired copyrighted materials can be highly transformative and qualify as fair use, giving developers clearer guidelines on lawful data use.

Key points for AI developers include:

  • Developers must ensure that datasets come from lawful sources, as using pirated or unauthorised copies, as found in part of Anthropic’s training data, creates clear liability for copyright infringement.
  • Practical data governance is essential; developers should carefully document data types, provenance, licensing status, and deletion processes to avoid legal risks associated with unlawful data sourcing.
  • AI developers must design models to avoid infringing outputs, such as memorisation or verbatim reproduction of copyrighted text. The court based its fair use ruling partly on Anthropic’s AI outputs not infringing in this way.
  • Developers working globally face stricter copyright rules, such as those in the EU, where they must meet more specific obligations concerning data use and the interests of rights holders. In South Africa, the Copyright Amendment Bill introduces a new open-ended fair use clause (s 12A) modelled on Section 107 of the U.S. law. Local precedent on transformative use remains limited, and moral rights protections, especially section 20, may complicate analysis when they affect the expressive integrity of a work.

Order regards the Anthropic AI copyright settlement

The case Bartz et al. v. Anthropic PBC was settled in 2025, with Anthropic agreeing to pay $1.5 billion for its past use of pirated books in AI training and committing to destroying infringing materials. The settlement covers only past conduct and does not protect against future claims or output-based copyright issues.

Overall, the case encourages responsible data acquisition and management practices in AI development, emphasising that both the source and use of copyrighted data matter legally. It also incentivises compliance through signalling that piracy poses substantial financial risks. For developers, the message is clear: respect provenance, document processes, and filter outputs. For rights holders, the case shows courts will enforce against unauthorised data use and reward clean licensing ecosystems.

Case Details:

  • Universal Citation: 3:24-cv-05417.
  • Case Number: 3:24-cv-05417 (N.D. Cal. Aug. 2024)
  • Full Name: Bartz et al. v. Anthropic PBC