• Home
  • Cloudflare accuses AI startup Perplexity…

Cloudflare accuses AI startup Perplexity of bypassing web scraping restrictions

Cloudflare accuses AI startup Perplexity of bypassing web scraping restrictions

AI startup Perplexity is under fire for allegedly scraping content from websites that explicitly barred such activity, according to new research from internet infrastructure giant Cloudflare.

In a blog post published Monday, Cloudflare accused Perplexity of deliberately circumventing safeguards designed to block unauthorized data scraping. The company claimed Perplexity disguised its crawler’s identity and bypassed directives in websites’ robots.txt files—an industry-standard tool used to prevent web scraping.

“This activity was observed across tens of thousands of domains and millions of requests per day,” Cloudflare said, noting that the AI company changed its user agent and network identifiers to avoid detection. Cloudflare researchers said they used machine learning and network signals to confirm that Perplexity was behind the behavior.

AI tools, including those built by Perplexity, require massive amounts of online data to function. But as website owners push back against unauthorized scraping, some AI companies appear to be ignoring those restrictions. Cloudflare’s post alleged that Perplexity used tactics like impersonating Google Chrome to evade blocks once their official crawler was denied access.

Perplexity spokesperson Jesse Dwyer dismissed the report as a “sales pitch,” insisting that “no content was accessed” and that the crawler mentioned in Cloudflare’s findings “isn’t even ours.”

However, Cloudflare said its investigation was prompted by complaints from its customers, who reported Perplexity scraping their sites despite having implemented blocks against the company’s bots.

This isn’t the first time Perplexity has faced criticism for questionable data practices. Last year, publications including Wired accused the startup of plagiarism—an accusation CEO Aravind Srinivas struggled to address during a public interview at TechCrunch Disrupt 2024.

In response to growing concern, Cloudflare has delisted Perplexity’s bots and introduced new tools to help site owners block unwanted AI scrapers.