AI bots are reshaping the Internet at its core. What was once primarily a human-driven space is now increasingly navigated by intelligent, autonomous bots that explore, interact and even make decisions…

Bots Evolution: From Search Indexing to AI

During the Internet’s early period, web crawlers functioned mainly within the framework of search engines such as Google and Bing. These bots had a straightforward yet impactful purpose: traversed websites, catalogued content, and structured data to enhance user accessibility. Their function was clearly established—these automated tools classified and prioritized web pages according to relevance, search terms, and visitor interaction. This period depended on programmed algorithms that determined how information was found and displayed.

AI and the Transformation of Web Scraping

In today’s landscape, we’re experiencing a fundamental transformation. The focus has shifted beyond merely indexing static web content for search engines; contemporary bots have developed into complex systems that transcend basic information retrieval. With artificial intelligence’s advancement, these bots now examine, understand, and even produce content. Rather than simply cataloguing websites, they actively consume enormous datasets to train AI systems, sometimes without content creators’ explicit permission.

This evolution presents significant challenges. AI-enhanced bots have progressed from passive information collectors to active content consumers, trend analysts, and in certain instances, content duplicators. We’ve transitioned from an environment where bots primarily helped organize information to one where they actively transformed it, establishing an entirely new digital framework.

Throughout 2024, AI bots and crawlers have made headlines due to their aggressive consumption of online content for training increasingly sophisticated models. These automated systems have sparked significant debate, as many disregard website owners’ explicit instructions to limit or prevent crawling activities.

Looking into last Cloudflare’s year in reveiew, some of those bots reveals noteworthy trends. The crawler Bytespider, run by TikTok’s Chinese parent company ByteDance, is believed to collect training data for its large language models. Bytespider’s activity demonstrated a consistent decline throughout 2024, with late November figures showing an approximate 80-85% reduction compared to January levels.

Meanwhile, Anthropic’s crawler ClaudeBot, which gathers training data for AI systems like Claude, showed virtually no activity until mid-April, with only occasional minor surges that likely represented testing phases. While more regular activity began in late April and briefly peaked, it gradually diminished throughout the remainder of the year.

(Cloudflare Radar - AI Bot & Crawler Traffic)

A more comprehensive view can be obtained in Data Explorer | Cloudflare Radar. In the last 3 months we can see GPTBot and Meta leading the pack, closely followed by Anthropic.

Globally, Cloudflare’s analysis revealed that the top 10 countries generated 68.5% of all monitored bot traffic in 2024, with the United States accounting for half of this volume—more than 5 times the share of Germany, which ranked second.

(Cloudflare Year in review 2024)

Regarding cloud service providers that generate bot traffic, Amazon Web Services contributed 12.7% of worldwide bot activity, followed by Google at 7.8%. Other significant contributors included Microsoft, Hetzner (that was a surprise), Digital Ocean, and OVH, each responsible for over 1% of global bot traffic.

(Cloudflare Year in review 2024)

It’s truly fascinating to explore Cloudflare Radar (which I have no affiliation with) and analyze the various trends across its different data categories. The site offers insights into internet traffic patterns, security threats, and bot activities that reveal the evolving digital landscape.

The Dark Side: Exploiting Infrastructure Gaps

With the rise of AI-enhanced bots comes both progress and risk. Automated systems now offer unprecedented capabilities but simultaneously create new security concerns. Cybercriminals increasingly deploy sophisticated bots to discover vulnerabilities across digital systems. These malicious programs work relentlessly, scanning websites, application interfaces, and data repositories for security flaws—searching for opportunities to extract sensitive information, circumvent security protocols, or harvest protected content for unauthorized use.

The “2024 Bad Bot Report from Imperva” states that nearly half of all internet traffic now comes from bots—both harmful and beneficial. However, the report highlights this growing trend without revealing the underlying data sources.

(Imperva report)

(Imperva report, bot traffic growing through years)

The evolution of bots — from simple search engine crawlers to sophisticated AI-driven systems — reshapes the internet landscape. While bots play essential roles, such as content indexing and enhancing user experiences, their proliferation has also introduced significant challenges. The rise of automated traffic, now accounting for nearly half of all internet activity, distorts web analytics, compromises security, and affects website performance. We’re certainly living through an intriguing period in digital history…