Perplexity AI embroiled in controversy over alleged web scraping abuse

Perplexity AI has found itself at the center of a firestorm over its data collection practices.Â

Perplexity essentially fuses a search engine with generative AI, returning AI-generated content related to the userâ€™s search query.Â Â

The processes required to do this likely involve improperly scraping content from numerous websites, including those that explicitly prohibit it.Â

The scandal erupted on June 11 when Forbes reported that Perplexity had lifted an entire article from its site, complete with custom illustrations, and repurposed it with only minimal attribution.Â

Not long after, WIRED conducted an investigation that uncovered evidence of Perplexity scraping content from websites that forbid automated data collection.Â

A website can request that its content isnâ€™t scraped by web crawlers through a file called â€œrobots.txt.â€

This exclusion protocol communicates with web crawlers and other automated bots. Itâ€™s a simple text file placed on a websiteâ€™s server that specifies which pages or sections of the website should not be accessed or scraped.

The robots.txt file has been a widely respected convention since the early days of the web. It helps website owners control their content and prevent unauthorized data collection.

Although not legally binding, it has long been considered best practice for web crawlers to follow the instructions outlined in a websiteâ€™s robots.txt file.

Jason Kint, CEO of Digital Content Next, a trade group representing online publishers, minced no words in his assessment of Perplexityâ€™s web scraping processes.Â

â€œBy default, AI companies should assume they have no right to take and reuse publishersâ€™ content without permission,â€ he said.Â

â€œIf Perplexity is skirting terms of service or robots.txt, the red alarms should be going off that something improper is going on.â€

Amazon investigates

These revelations have prompted Amazon Web Services (AWS), which hosts a server implicated in Perplexityâ€™s alleged improper scraping, to launch an investigation.Â

AWS strictly prohibits customers from engaging in abusive or illegal activities that violate its terms of service.

Perplexity CEO Aravind Srinivas initially brushed off the concerns, asserting they reflected â€œa deep and fundamental misunderstandingâ€ of the companyâ€™s operations and the internet at large.Â

However, in a subsequent interview with Fast Company, he conceded that Perplexity relied on an unnamed third-party vendor for web crawling and indexing, suggesting they were to blame for any robots.txt violations.Â

Srinivas declined to identify the company, citing a non-disclosure agreement.

For the moment, Perplexity appears determined to weather the storm, with a spokesperson downplaying the AWS probe as â€œstandard procedureâ€ and indicating the company has made no changes to its operations.Â

However, the startupâ€™s defiant stance may prove untenable as the groundswell of concern over AIâ€™s data practices continues to build.

The post Perplexity AI embroiled in controversy over alleged web scraping abuse appeared first on DailyAI.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Build Confidence In Your UX Work

Microsoft’s ‘ultimate goal is to remove passwords completely’ — this overhaul could make it happen

Intel’s new CEO requests “brutal honesty” from partners in his first keynote speech — Determined to build a “world-class” foundry

Xbox fans, I wasn’t ready for $80 games, but Nintendo Switch 2’s Mario Kart World just set the tone

The Nintendo Switch 2 has game sharing and a camera — sound familiar?

Community News: Latest PEAR Releases (03.10.2025)

Community News: Latest PEAR Releases (03.10.2025)

Community News: Latest PECL Releases (03.11.2025)

Perficient Included in IDC Market Glance: Payer, 1Q25

Microsoft’s ‘ultimate goal is to remove passwords completely’ — this overhaul could make it happen

Microsoft’s ‘ultimate goal is to remove passwords completely’ — this overhaul could make it happen

Intel’s new CEO requests “brutal honesty” from partners in his first keynote speech — Determined to build a “world-class” foundry

Xbox fans, I wasn’t ready for $80 games, but Nintendo Switch 2’s Mario Kart World just set the tone

Perplexity AI embroiled in controversy over alleged web scraping abuse

Amazon investigates

ruby-align is Baseline Newly available

February 2025 Baseline monthly digest

ArmSoM AIM7: A Promising Upcoming Rockchip Device for AI Development

Graph Structure Learning Framework (GSLI): Advancing Spatial-Temporal Data Imputation through Multi-Scale Graph Learning

The Silent Voice

The 15 best Black Friday Target deals 2024

Time-Travel Proof? A Mysterious Photograph Discovered by Digital Marketing Legend Srinidhi Ranganathan

Gemini, following Copilotâ€™s footsteps, is now available on Google Workspace side panel

Shokz’ latest open-ear headphones are earning praise at CES 2025 – here’s why

This budget Lenovo 2-in-1 I recommend to students and professionals is cheaper than ever

Perplexity AI embroiled in controversy over alleged web scraping abuse

Amazon investigates

Related Posts