Back to Glossary
SEO

Crawl Budget

Crawl budget is the number of URLs Googlebot wants to crawl and is able to crawl on a given site over a period of time. It is determined by two factors: how much load the server can handle (crawl capacity limit) and how much Google wants to crawl the site (crawl demand).

  • Crawl budget is the number of URLs Googlebot wants to and can crawl on a site over a given period, set by two factors: crawl capacity limit and crawl demand.
  • The crawl capacity limit rises and falls automatically based on how fast the site responds (crawl health) and the resources Google has available.
  • Crawl demand is shaped by the site's perceived inventory, the popularity of its URLs, and content freshness.
  • Duplicate content, faceted navigation, soft 404s, and infinite URL spaces are the classic ways crawl budget gets wasted.
  • It matters most for large sites; small sites whose pages are crawled the day they publish rarely need to worry about it.

Overview

Crawl budget refers to the set of URLs Googlebot wants to crawl and is able to crawl on a particular site within a given period. Google's official documentation explains that it is determined together by two factors: a site's crawl capacity limit and its crawl demand. In other words, it is the concept that governs how often and how many of a site's pages Googlebot will fetch.

Where this really matters in practice is on larger sites. Google recommends actively managing crawl budget in cases such as the following.

  • Large sites with more than 1 million unique pages that change roughly once a week
  • Medium-to-large sites with more than 10,000 pieces of content that change daily
  • Sites where a substantial share of URLs are reported in Search Console as “Discovered – currently not indexed”

Conversely, if you run a small site with few pages, or one where pages are crawled the same day they are published, there is no need to fuss over crawl budget. In that case, simply keeping your sitemap current and periodically reviewing index coverage is enough.

Crawl capacity limit

The crawl capacity limit is the maximum number of simultaneous parallel connections Googlebot uses when crawling a site. Google calculates it automatically so as not to overload the server, and it moves up or down depending on the following factors.

  • Crawl health: If the site responds quickly, the limit goes up; if the server slows down or returns errors, Googlebot crawls less.
  • Google's resources: Google's total machine capacity is finite, and the limit is adjusted accordingly.

Crawl demand

Crawl demand reflects how much Google wants to fetch the content of a given site. The official documentation cites three main factors.

  • Perceived inventory: The amount of URLs Google sees as worth crawling on the site. A large number of duplicate URLs wastes crawling time.
  • Popularity: More popular URLs are crawled more often to keep them fresher in the index.
  • Staleness: Google's systems recrawl pages to detect changes in content.

What wastes crawl budget

When limited crawling resources are spent on low-value URLs, crawling and indexing of the pages that actually matter get delayed. The waste factors Google calls out include the following.

  • Mass generation of effectively identical URLs through faceted navigation, session IDs, and similar mechanisms
  • Duplicate content within the site
  • Soft 404s (pages that return a 200 response even though the page does not really exist)
  • Hacked pages
  • Endless, calendar-like infinite URL spaces
  • Low-quality and spam content

Optimization methods

Google's official guide suggests the following methods for optimizing crawl efficiency.

  • Consolidate duplicate content: Clean up duplicates so crawling focuses on unique content rather than unique URLs.
  • Block crawling with robots.txt: Prevent crawling of unnecessary URLs such as infinite-scroll variants and facet combinations. Note that this is not recommended as a short-term way to reallocate crawling.
  • Return 404/410: Return a 404 or 410 status code for pages that have been permanently removed.
  • Eliminate soft 404s: Make sure non-existent pages do not return a 200.
  • Keep sitemaps current: Add a <lastmod> tag to changed URLs to signal freshness.
  • Remove long redirect chains: Redirects that hop through several steps hurt crawling.

How to increase crawl budget

According to Google, there are essentially only two ways to increase crawl budget. The first is to grow your server's capacity to handle crawling, and the second – and more important – is to raise the value of the content you offer to search users. Rather than technical tricks aimed simply at getting crawled more, the key is to become a site that is worth crawling.

Implementation checklist

  • Use Search Console's Crawl Stats report to review daily crawl request counts and average response times.
  • Check whether a high proportion of URLs are in the “Discovered – currently not indexed” state.
  • Block or canonicalize duplicate URLs created by faceted navigation, session IDs, and sort parameters.
  • Verify that deleted pages return 404/410 rather than 200 (soft 404).
  • Reflect <lastmod> accurately in the sitemap and update it whenever pages change.
  • Collapse redirect chains down to a single hop and keep server response times stable.
  • Consolidate or de-index low-quality and duplicate pages so crawling resources concentrate on high-value pages.

References and sources

Related terms