Cloudflare has a new feature—available to free users as well—that uses AI to generate random pages to feed to AI web crawlers:
Instead of simply blocking bots, Cloudflare’s new system lures them into a “maze” of realistic-looking but irrelevant pages, wasting the crawler’s computing resources. The approach is a notable shift from the standard block-and-defend strategy used by most website protection services. Cloudflare says blocking bots sometimes backfires because it alerts the crawler’s operators that they’ve been detected.
“When we detect unauthorized crawling, rather than blocking the request, we will link to a series of AI-generated pages that are convincing enough to entice a crawler to traverse them,” writes Cloudflare. “But while real looking, this content is not actually the content of the site we are protecting, so the crawler wastes time and resources.”
The company says the content served to bots is deliberately irrelevant to the website being crawled, but it is carefully sourced or generated using real scientific facts—such as neutral information about biology, physics, or mathematics—to avoid spreading misinformation (whether this approach effectively prevents misinformation, however, remains unproven).
It’s basically an AI-generated honeypot. And AI scraping is a growing problem:
The scale of AI crawling on the web appears substantial, according to Cloudflare’s data that lines up with anecdotal reports we’ve heard from sources. The company says that AI crawlers generate more than 50 billion requests to their network daily, amounting to nearly 1 percent of all web traffic they process. Many of these crawlers collect website data to train large language models without permission from site owners….
Presumably the crawlers will now have to up both their scraping stealth and their ability to filter out AI-generated content like this. Which means the honeypots will have to get better at detecting scrapers and more stealthy in their fake content. This arms race is likely to go back and forth, wasting a lot of energy in the process.
More Stories
WP Ultimate CSV Importer Flaws Expose 20,000 Websites to Attacks
WP Ultimate CSV Importer flaws expose 20,000 websites to attacks enabling attackers to achieve full site compromise Read More
The AI Fix #44: AI-generated malware, and a stunning AI breakthrough
In episode 44 of The AI Fix, ChatGPT won’t build a crystal meth lab, GPT-4o improves the show’s podcast art,...
Ukraine Blames Russia for Railway Hack, Labels It “Act of Terrorism”
The CERT-UA investigation concluded that the attack’s techniques were “characteristic of Russian intelligence services” Read More
New Phishing Attack Combines Vishing and DLL Sideloading Techniques
A new attack targeting Microsoft Teams users used vishing, remote access tools and DLL sideloading to deploy a JavaScript backdoor...
Google to Switch on E2EE for All Gmail Users
Google is set to roll out end-to-end encryption for all Gmail users, boosting security, compliance and data sovereignty efforts Read...
Cybercriminals Expand Use of Lookalike Domains in Email Attacks
BlueVoyant found that the use of lookalike domains in email-based attacks is allowing actors to extend the types of individuals...