Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Artificial Intelligence»Authors sue Anthropic for using pirated books to train Claude

    Authors sue Anthropic for using pirated books to train Claude

    August 21, 2024

    A group of authors filed a class-action lawsuit against Anthropic in a California court on Monday. The authors claim Anthropic built its business by “stealing hundreds of thousands of copyrighted books.”

    The three authors, Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson claim that their books were part of the dataset that Anthropic used to train its family of Claude models. In their suit, they allege that Anthropic was guilty of “downloading and copying hundreds of thousands of copyrighted books taken from pirated and illegal websites.”

    The authors questioned Anthropic’s claim to be a public benefit company saying, “It is no exaggeration to say that Anthropic’s model seeks to profit from strip-mining the human expression and ingenuity behind each one of those works.”

    The Pile

    The books in question are part of a controversial dataset called Books3, which previously formed part of a larger dataset called The Pile. It’s generally accepted, but not admitted, that just about every one of the big LLMs trained their models on The Pile.

    The Pile consists of around 825GB of academic papers, books, websites, technical documents, and more. One of The Pile’s architects is an independent developer named Shawn Presser. Presser created the Books3 dataset in 2020 and added it to The Pile.

    Books3 contains 196,640 books in plain text format by famous authors like Stephen King as well as the authors that brought this lawsuit. It’s believed that Presser used Bibliotik, a notorious torrent tracker used by an invite-only community of book pirates, as the source for Books3.

    Suppose you wanted to train a world-class GPT model, just like OpenAI. How? You have no data.

    Now you do. Now everyone does.

    Presenting “books3”, aka “all of bibliotik”

    – 196,640 books
    – in plain .txt
    – reliable, direct download, for years: https://t.co/KKSrhEAnrD

    thread pic.twitter.com/m6bdpHfYJx

    — Shawn Presser (@theshawwn) October 25, 2020

    When The Pile was hosted and made publicly available online by the nonprofit EleutherAI, it noted its reasons for including the pirated books. EleutherAI said, “We included Bibliotik because books are invaluable for long-range context modeling research and coherent storytelling.”

    In August 2023, Books3 was removed from the “most official” copy of The Pile, but by that time it had been used by pretty much all the big names in AI model development.

    In July 2024, Anthropic publicly acknowledged that it used The Pile to train its Claude models. While Anthropic is yet to respond to the lawsuit, it’ll likely revert to the same “fair use” defense that OpenAI and others facing similar lawsuits are using.

    The real damage

    Besides the copyright issue, the lawsuit reveals the genuine fear that authors have of AI taking over their source of income.

    The suit alleges that “Anthropic, in taking authors’ works without compensation, has deprived authors of book sales and licensing revenues.” That may be hard to prove. Claude will describe the book “The Feather Thief” by Kirk Wallace Johnson, but it declines to reproduce even a single page.

    I suspect Claude is lying when it responds with “I apologize, but I don’t have access to the actual text of “The Feather Thief” or its first page,” because it goes on to describe what takes place on page 1. If you want to read the book, you’ll need to buy it or go to a library.

    Even so, the authors say that “Anthropic’s Claude and other LLMs like it seriously threaten the livelihood” of authors. They say that writing work is “starting to dry up as a result of generative AI systems trained on those writers’ works, without compensation, to begin with.”

    As evidence of this, the suit relates how a man named Tim Boucher “wrote” 97 books using Claude and ChatGPT in less than a year, and sold them at prices from $1.99 to $5.99.

    The lawsuit is calling for a jury trial and unspecified damages. It will be interesting to see if the jurors value copyright law more than the utility of AI models like Claude.

    The post Authors sue Anthropic for using pirated books to train Claude appeared first on DailyAI.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleOpenAI and Vogue publisher Condé Nast forge data partnership 
    Next Article LWiAI Podcast #179 – Grok 2, Gemini Live, Flux, FalconMamba, AI Scientist

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-40906 – MongoDB BSON Serialization BSON::XS Multiple Vulnerabilities

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Google Meet gets emoji reactions, filters, and mobile screen-sharing

    Development

    Chrome on Android experiments with new floating snackbars to push notifications over web content

    Development

    Error’d: The State of the Arts

    Development

    Optimize Amazon Aurora PostgreSQL auto scaling performance with automated cache pre-warming

    Databases

    Highlights

    Development

    Microsoft Services Hit by Cyberattack, Amplifying Outage Impact Across Multiple Platforms

    July 31, 2024

    Days after Microsoft experienced a major global outage that disrupted its services, the company is…

    Scaling to 70M users: How Flo Health optimized Amazon DynamoDB for cost and performance

    December 18, 2024

    How to investigate the online vs offline performance for DNN models

    December 20, 2024

    CVE-2025-2850 – “GL.iNet Router Unauthorized Download Interface Processing Vulnerability”

    April 26, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.