How to detect AI crawlers

If a website sees unusual activity, it may be because of AI crawler bots. Reviewing a website's log files can help determine which AI bots are crawling a website.

Learning Objectives

After reading this article you will be able to:

Describe why AI bots crawl websites
Understand how to detect bots and AI crawlers in log files via user-agent strings
List the major AI crawler bots and what they do

How to detect AI crawlers

Bots make up a large percentage of page visitors to websites. Bots that visit websites serve a range of purposes, but especially common today are AI crawler bots. Such bots focus on discovering web content for training AI models. AI bots also help AI assistants surface webpages to answer user queries. Since high amounts of bot traffic can strain a web properties' resources, website administrators need to make sure they can identify AI crawlers in logs, and take steps to reduce their impact if they crawl too often.

Verified AI crawler activity can be monitored using website logs along with a logs analytics tool (since manual analysis of millions of logs is nearly impossible). Administrators can search in their logs for the user-agent strings of the entities requesting content, and get visibility into how many requests come from AI crawlers.

What do AI crawler bots do?

AI crawlers are bots that "crawl" or request webpages, using hyperlinks to explore the entire public web. They are far from the only crawler bots: for decades, search engine crawler bots have scanned and indexed web content in order to provide it to users in search results.

But one of the differences between AI crawlers and search crawlers is that AI crawlers are much less likely to refer human user traffic to the pages they crawl. Rather, they use the pages they crawl to train AI models that respond to user queries without the user leaving the AI app or visiting a website.

Web servers, therefore, might serve high amounts of AI requests but see traffic from human visitors drop, in contrast to what happens when search crawler bots discover web content and begin referring traffic to the pages that host it. Websites that experience this may want to limit or block AI crawler bots so that their resources are not spent in vain. Conversely, some website administrators may want to make sure AI crawlers can crawl their websites so that they show up in AI overviews. Either way, identifying and managing AI crawler bot traffic is crucial for most websites.

How to track AI crawler activity via user-agent strings

All persons and things browsing the web have a user-agent string included in the HTTP requests they send (this is distinct from their IP address). For humans, user-agent strings are generated by the browser and usually indicate device type and browser type, something like:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/143.0.0.0 Safari/537.36

Bots are not necessarily using browsers or specific consumer devices, and most crawler bots have simple, clearly defined user agent strings, like:

Googlebot

Search in logs for the user-agent strings associated with known bots to see which crawlers are reaching a website, how many pages they are requesting, how often they crawl, and more.

The most common AI crawlers, and the ones that are most likely to crawl a site at any given time, include:

Meta-ExternalAgent
GPTBot (from OpenAI)
GoogleOther
Amazonbot
PetalBot (from Huawei)

A more complete list of these AI crawlers with their user-agent strings is available below, or in the continually updated, and freely available, Cloudflare Radar report.

Which AI bots are crawling your site?

AI bots can be from organizations that run AI models, or they can be from AI agents or other AI products. Some are looking for training data for their models; others look for information they can source to answer live user queries.

The following bots are all verified and have public documentation.

List of common AI web crawlers

Meta-ExternalAgent

This bot is from Meta (best known for operating Facebook and Instagram). Meta-ExternalAgent crawls the web to find content for training AI models. As of 2026, this bot sends the second-most requests of all bots on the web (after the search crawler Googlebot).