Crawler
A crawler is a bot (program) that automatically navigates web pages and collects their content, also known as a "spider." Search engines index the pages a crawler gathers to build their search results, with Googlebot and Bingbot being the most prominent examples.
- A crawler is a bot (program) that automatically travels the web by following links to collect pages, also called a spider.
- The crawler is the agent (the bot), while the act of that bot visiting and collecting pages is distinguished as crawling.
- Google's primary crawler is Googlebot, and under its mobile-first indexing policy most requests come from the Googlebot smartphone variant.
- According to Google's official documentation, common crawlers always obey robots.txt rules during automatic crawling.
- Because crawler names are easy to spoof, Googlebot and Bingbot should be verified through official methods such as reverse DNS lookups.
Definition of a Crawler
A crawler is a bot or program that automatically visits web pages to read and collect their content. Because it moves from one page to the next by following the links it finds, sweeping across the web like a spider's web, it is also called a spider or a bot. Search engines analyze and index the pages a crawler gathers and then serve them as search results, making the crawler the starting point for any page's visibility in search.
An important distinction applies here. The crawler refers to the agent (the bot) that collects pages, while the act of that bot actually visiting and gathering pages is called crawling. In other words, you would say "a crawler called Googlebot crawls a site."
Major Search Engine Crawlers
| Crawler Name | Operator | Purpose |
|---|---|---|
| Googlebot | Default crawler for Google Search, Images, News, and Discover (desktop and smartphone variants) | |
| Googlebot-Image | Dedicated crawler for image content (Google Images) | |
| Googlebot-News | Dedicated to crawling for Google News | |
| Google-Extended | Controls whether content is used for training and grounding AI models such as Gemini (no effect on search ranking) | |
| Bingbot | Microsoft | Standard crawler for Bing search indexing (desktop and mobile variants) |
Beyond these, Google also operates purpose-specific crawlers such as Googlebot-Video for video and GoogleOther for research and one-off crawling. Each crawler is identified by the user agent token it uses in robots.txt, but because Googlebot desktop and smartphone share the same token, you cannot selectively block one of the two through robots.txt.
How Crawlers Work
A crawler discovers new URLs through the links found on pages it has already crawled. Google's official documentation explains that it is "nearly impossible to keep a site secret simply by not publishing links," because an address can still be exposed through referrer information and similar sources. Googlebot throttles its crawl rate so that, on average, it does not access a site more than once every few seconds, and for supported file types it downloads up to the first 2 MB (64 MB for PDFs) before processing.
Because Google Search indexes mobile content first, the majority of crawl requests come from the Googlebot smartphone variant. Crawlers also honor HTTP caching standards such as ETag and Last-Modified, and they support both HTTP/1.1 and HTTP/2.
robots.txt Compliance and Crawler Verification
According to Google's official documentation, common crawlers like Googlebot always obey robots.txt rules during automatic crawling. There are exceptions, however: special crawlers such as the advertising-related AdsBot, which presuppose an agreement with the site operator, may bypass the global rule (*).
At the same time, you should be aware that many bots impersonate crawlers. Even if a user-agent header reads "Googlebot" or "bingbot," that alone does not prove it is genuine. Microsoft recommends a reverse DNS lookup followed by a forward IP lookup to verify Bingbot, and Google likewise identifies its own crawlers by their IP addresses and reverse DNS hostnames.