Fair Use And The New Rules For AI Training

6/26/20253 min read

On June 23, 2025, the artificial intelligence industry received long-awaited clarity. The decision in Bartz v. Anthropic PBC, delivered by Judge William Alsup in the Northern District of California, established a foundational precedent for the responsible development of language models in California and, likely, beyond.

At last, the question of whether training AI with copyrighted works constitutes infringement or fair use has a clear judicial answer.

The Limits of Fair Use

At the heart of the ruling lies a crucial distinction: training a language model with lawfully obtained works, whether purchased or digitized from owned copies, qualifies as fair use under Section 107 of the Copyright Act. The court is unequivocal that the legitimacy of the training process is inseparable from the legitimacy of the data’s source. There is no room for shortcuts; building a library of pirated books, even if not directly used for training, falls outside the protection of fair use and exposes companies to significant liability.

To further illustrate this point, we can think of “the rotten fruit” analogy. Just as fruit tainted at the source cannot be cleansed by later handling, knowledge or data acquired through unlawful means remains contaminated, regardless of subsequent actions. Even if a company later purchases the same books or ensures that the AI’s outputs are wholly original and do not mimic the authors, the initial act of acquiring pirated material irreparably taints the process. The source of knowledge matters: if it is compromised at the outset, no amount of later compliance can undo the original violation. This analogy underscores the court’s message that the integrity of AI training depends not only on what is produced, but on how and from where the underlying data for training the model is obtained.

[When I refer to the “rotten fruit” analogy, I am actually drawing on the well-known “fruit of the poisonous tree” doctrine from U.S. law, established in 1920 by the decision in Silverthorne Lumber Co. v. United States, which holds that evidence derived from an illegal source is itself tainted. While this doctrine does not directly apply to the Anthropic case, I use the metaphor here purely for explanatory context and to illustrate how the legitimacy of AI training data depends on its lawful origin and to make the explanation more accessible and comprehensive for readers.]

The Human Analogy

Perhaps the most interesting aspect of the decision is the analogy drawn between human and AI learning. Judge Alsup recognizes that training an AI model is, at its core, an act of learning. Just as a writer absorbs the style and structure of the books they read, a language model internalizes patterns from its training corpus. Copyright law, as interpreted here, does not grant authors the power to control how others, human or machine, learn from their works. What matters is not the act of learning itself, but how that knowledge is used. As long as the outputs generated by the AI do not reproduce or substantially mimic the original works, the training process remains within the bounds of fair use.

A New Standard for Responsible AI Development

This ruling provides the industry with much-needed clarity on how to approach future AI training and fine-tuning practices. It establishes a pragmatic balance: innovation is encouraged, but it must be pursued with respect for intellectual property. The decision offers certainty to those who operate within the boundaries of the law, while serving as a clear warning to those who might be tempted by shortcuts. There is no special exception for artificial intelligence, but neither is there a prohibition on progress.

For Anthropic, the cost of this clarity is significant. The company may face millions of dollars in damages for the unlawful use of pirated books, a consequence that underscores the risks of disregarding copyright boundaries. Yet, this outcome may also pave the way for future settlements and industry norms in similar cases, setting a precedent that will shape AI governance for years to come. Ultimately, the Bartz v. Anthropic decision invites us to envision a future where innovation and intellectual property are not adversaries, but partners in the ongoing evolution of both human and machine learning.

Key Takeaways

  • Training AI with lawfully obtained works is fair use. The source of data matters.

  • Using pirated materials for AI training is not protected and brings legal risk.

  • The court treats AI learning like human learning. What matters is if the AI’s output unlawfully copies original works.

Want to understand how this impacts your organization?

Written By: Josué David Citalán Hernández
Legal Specialist at KNOW-ME


If you care to take a look at the original text, see below: