Indexability
Indexability is the degree to which a search engine is able to add a given page to its index. It describes whether a page, once crawled, can appear in search results without blocking factors such as noindex directives, canonical conflicts, or duplication.
- Indexability is the capacity for a search engine to include a page in its index; being crawled does not automatically mean a page gets indexed.
- The main blockers are noindex tags and HTTP headers, misconfigured canonicals, duplicate content, robots.txt blocking, server errors, and orphan pages.
- It is distinct from crawlability (the ability to access and discover a page): if crawling is blocked, the engine never reaches the point of judging whether the page can be indexed at all.
- You can diagnose indexing blockers with the Google Search Console URL Inspection tool and with site audits in Semrush or Ahrefs.
Overview
Indexability refers to the degree to which a search engine can add a page it has discovered to its own index database. Semrush defines indexability as "a search engine's ability to add a page to its index," noting that even a crawled page is only indexed once it meets the relevant quality standards and indexing directives.
The key point is that indexing is not a process here but a capacity. Indexing describes the actual operation in which a search engine analyzes and stores a page, whereas indexability describes the state and conditions of a page that make that operation possible. As a result, the same page is judged to have low indexability whenever an index-blocking directive is applied to it.
Blocking Factors and the Difference from Crawlability
The most common factors that reduce indexability are as follows.
- noindex directives: when
<meta name="robots" content="noindex">or anX-Robots-TagHTTP header is applied, the page is excluded from the index even if it is crawled. - Misconfigured canonicals: an incorrect canonical confuses the search engine about which version to index, so the intended page may not be indexed.
- Duplicate content: when many near-identical pages exist, the search engine cannot single out which one to index and rank.
- robots.txt blocking: this prevents crawling itself, cutting off any chance to read the indexing directives.
- Server errors, redirect loops, and orphan pages: these obstruct access and analysis, lowering the likelihood of being indexed.
Crawlability concerns whether a search engine bot can discover and access a page. Semrush distinguishes crawlability as "how easily a search engine can discover a page" from indexability as "whether a search engine can add a page to its index." In other words, crawlability deals with the access stage and indexability with the possibility at the indexing stage. The two are sequentially linked: if crawling is impossible, the engine never reaches the stage of judging whether the page can be indexed.
Diagnosis, Improvement, and Evidence
According to Google Search Central documentation, for a noindex rule to take effect the page must not be blocked by robots.txt. When a page is blocked, Googlebot cannot read the noindex directive, so the page may still surface in search results through external links and other signals. The recommended approach for a page you want to exclude is therefore to leave it crawlable and apply only noindex.
Diagnosis fundamentally relies on Google Search Console's URL Inspection tool to confirm the HTML Googlebot actually received, the index status, and the reason for any exclusion. You can also review the list of pages where noindex was detected in the Page Indexing report. The site audit features in Semrush and Ahrefs scan index-blocking items such as broken, orphan, and duplicate content in a single pass, surfacing the affected pages and how to fix them.
Action Checklist
- Check that pages meant to be indexed carry no unintended noindex tags or headers.
- For pages you want excluded, do not block them in robots.txt; keep them crawlable and apply noindex.
- Verify that the canonical points to the correct representative URL and that self-referencing canonicals are accurate.
- Consolidate duplicate or near-identical content, or resolve it with canonicals.
- Add internal links to orphan pages and eliminate redirect loops and server errors.
- Periodically validate index status and exclusion reasons with the Google Search Console URL Inspection tool.
- Routinely monitor index-blocking issues with Semrush or Ahrefs site audits.