Indexing
Indexing is the process by which a search engine analyzes the content of a crawled and rendered page and stores it in Google's index (its database). It is the step that follows crawling, and a page must be stored in the index before it can appear in search results.
- Indexing is the process of analyzing a crawled and rendered page and storing it in Google's index, a massive database.
- It is the step after crawling (page discovery); a page must enter the index to become a candidate for search results.
- During indexing Google analyzes text, titles, alt attributes, images, and videos, picks a representative (canonical) page from duplicates, and gathers signals such as language, region, and page usability.
- A noindex rule lets you control whether a given page is added to the index, but it has no effect on pages blocked by robots.txt.
- Indexing is not guaranteed, and not every page Google processes ends up in the index.
Indexing overview
Indexing refers to the process in which a search engine determines what a crawled and rendered page is about and stores it in a massive database known as Google's index. Google's documentation describes this as the stage where, after a page has been crawled, Google tries to understand what that page is about. In other words, indexing is the step that comes immediately after crawling, the act of finding pages, and only once a page is stored in the index can it become a candidate to appear in search results.
It is worth distinguishing these concepts. Crawling is the act of discovering and fetching a page, while indexability describes whether a page is in a state that allows it to be indexed. Indexing, by contrast, is the process of actually storing an analyzed page in the index. Being discovered does not mean a page is immediately indexed; as Google states, indexing is not guaranteed, and not every page Google processes is indexed.
How indexing works
After Googlebot fetches a page, a rendering stage follows in which JavaScript is executed using a recent version of Chrome. Because many websites populate their content with JavaScript, Google may not see that content without rendering. Once rendering is complete, the indexing stage begins.
During indexing, Google analyzes several types of content and elements. According to Google's documentation, this stage processes the textual content and key content tags and attributes, such as title elements and alt attributes, along with images and videos. Google also groups together pages that are similar to one another, selects the most representative one as the canonical, and treats the rest as alternate versions. Alongside this, it collects signals such as the page's language, the country the content is relevant to, and the page's usability.
Controlling indexing with noindex
To control whether a specific page is added to the index, you use a noindex rule. When Googlebot crawls the page and extracts this tag or header, it removes the page entirely from search results, even if other sites link to it. There are two ways to implement it.
| Implementation | Example | Use case |
|---|---|---|
| Meta tag | A robots meta tag in the head section | HTML pages |
| HTTP response header | The X-Robots-Tag header | Non-HTML resources, including PDFs and images |
<meta name="robots" content="noindex">X-Robots-Tag: noindexThere is an important precondition here. For a noindex rule to take effect, the page or resource must not be blocked by robots.txt, and the crawler must be able to access it. If access is blocked by robots.txt, Googlebot cannot read the page and therefore never discovers the noindex directive, so the page may end up remaining in the index through external links and similar paths.
Why pages are not indexed
Not every page that is processed gets added to the index. Google identifies the following common causes.
- The content is low quality.
- Robots meta rules, such as noindex, prevent indexing.
- The website's design makes indexing difficult.
Basis
The definition of indexing in this document, the relationship between the crawling, rendering, and indexing stages, the elements analyzed during indexing (text, tags, images, video), canonical selection, the collection of language, country, and usability signals, and the explanation that indexing is not guaranteed are all based on Google Search Central's official "In-depth guide to how Google Search works" documentation. The behavior of the noindex rule, the meta tag and X-Robots-Tag header implementations, and the precondition that noindex is ineffective when a page is blocked by robots.txt are based on Google Search Central's official "Block Search indexing with noindex" documentation.