Authors sue Anthropic for using pirated books to train Claude

A group of authors filed a class-action lawsuit against Anthropic in a California court on Monday. The authors claim Anthropic built its business by â€œstealing hundreds of thousands of copyrighted books.â€

The three authors, Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson claim that their books were part of the dataset that Anthropic used to train its family of Claude models. In their suit, they allege that Anthropic was guilty of â€œdownloading and copying hundreds of thousands of copyrighted books taken from pirated and illegal websites.â€

The authors questioned Anthropicâ€™s claim to be a public benefit company saying, â€œIt is no exaggeration to say that Anthropicâ€™s model seeks to profit from strip-mining the human expression and ingenuity behind each one of those works.â€

The Pile

The books in question are part of a controversial dataset called Books3, which previously formed part of a larger dataset called The Pile. Itâ€™s generally accepted, but not admitted, that just about every one of the big LLMs trained their models on The Pile.

The Pile consists of around 825GB of academic papers, books, websites, technical documents, and more. One of The Pileâ€™s architects is an independent developer named Shawn Presser. Presser created the Books3 dataset in 2020 and added it to The Pile.

Books3 contains 196,640 books in plain text format by famous authors like Stephen King as well as the authors that brought this lawsuit. Itâ€™s believed that Presser used Bibliotik, a notorious torrent tracker used by an invite-only community of book pirates, as the source for Books3.

Suppose you wanted to train a world-class GPT model, just like OpenAI. How? You have no data.

Now you do. Now everyone does.

Presenting â€œbooks3â€, aka â€œall of bibliotikâ€

â€“ 196,640 books
â€“ in plain .txt
â€“ reliable, direct download, for years: https://t.co/KKSrhEAnrD

thread pic.twitter.com/m6bdpHfYJx

â€” Shawn Presser (@theshawwn) October 25, 2020

When The Pile was hosted and made publicly available online by the nonprofit EleutherAI, it noted its reasons for including the pirated books. EleutherAI said, â€œWe included Bibliotik because books are invaluable for long-range context modeling research and coherent storytelling.â€

In August 2023, Books3 was removed from the â€œmost officialâ€ copy of The Pile, but by that time it had been used by pretty much all the big names in AI model development.

In July 2024, Anthropic publicly acknowledged that it used The Pile to train its Claude models. While Anthropic is yet to respond to the lawsuit, itâ€™ll likely revert to the same â€œfair useâ€ defense that OpenAI and others facing similar lawsuits are using.

The real damage

Besides the copyright issue, the lawsuit reveals the genuine fear that authors have of AI taking over their source of income.

The suit alleges that â€œAnthropic, in taking authorsâ€™ works without compensation, has deprived authors of book sales and licensing revenues.â€ That may be hard to prove. Claude will describe the book â€œThe Feather Thiefâ€ by Kirk Wallace Johnson, but it declines to reproduce even a single page.

I suspect Claude is lying when it responds with â€œI apologize, but I donâ€™t have access to the actual text of â€œThe Feather Thiefâ€ or its first page,â€ because it goes on to describe what takes place on page 1. If you want to read the book, youâ€™ll need to buy it or go to a library.

Even so, the authors say that â€œAnthropicâ€™s Claude and other LLMs like it seriously threaten the livelihoodâ€ of authors. They say that writing work is â€œstarting to dry up as a result of generative AI systems trained on those writersâ€™ works, without compensation, to begin with.â€

As evidence of this, the suit relates how a man named Tim Boucher â€œwroteâ€ 97 books using Claude and ChatGPT in less than a year, and sold them at prices from $1.99 to $5.99.

The lawsuit is calling for a jury trial and unspecified damages. It will be interesting to see if the jurors value copyright law more than the utility of AI models like Claude.

The post Authors sue Anthropic for using pirated books to train Claude appeared first on DailyAI.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Minecraft licensing robbed us of this controversial NFL schedule release video

The power of generators

The power of generators

Simplify Factory Associations with Laravel’s UseFactory Attribute

This Week in Laravel: React Native, PhpStorm Junie, and more

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Authors sue Anthropic for using pirated books to train Claude

The Pile

The real damage

Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

CVE-2025-40906 – MongoDB BSON Serialization BSON::XS Multiple Vulnerabilities

Google Meet gets emoji reactions, filters, and mobile screen-sharing

Chrome on Android experiments with new floating snackbars to push notifications over web content

Error’d: The State of the Arts

Optimize Amazon Aurora PostgreSQL auto scaling performance with automated cache pre-warming

Microsoft Services Hit by Cyberattack, Amplifying Outage Impact Across Multiple Platforms

Scaling to 70M users: How Flo Health optimized Amazon DynamoDB for cost and performance

How to investigate the online vs offline performance for DNN models

CVE-2025-2850 – “GL.iNet Router Unauthorized Download Interface Processing Vulnerability”

Authors sue Anthropic for using pirated books to train Claude

The Pile

The real damage

Related Posts