Hybrid Search
Hybrid search runs a keyword search (a sparse-vector method like BM25) and a semantic vector (embedding) search in parallel, then fuses the two result sets into a single ranking. Because it captures exact term matches and contextual meaning at the same time, it is widely used to improve retrieval accuracy in RAG and AI search.
- Hybrid search runs keyword search (BM25) and vector (semantic) search in parallel and merges the two result sets into one ranking.
- Keyword search excels at exact matches such as product codes, proper nouns, and technical jargon, while vector search excels at synonyms, context, and paraphrased queries, so combining them lets each cover the other's weak spots.
- The standard fusion algorithm for merging the two result sets is RRF (Reciprocal Rank Fusion), which uses only rank rather than score and thereby sidesteps the problem of normalizing scores that live on different scales.
- Elasticsearch and Azure AI Search default the RRF constant k to 60, while Weaviate tunes the weighting through an alpha parameter (0 = keyword, 1 = vector, default 0.75).
- It is a core technique for raising retrieval quality in RAG and AI search pipelines, and a reranking step is usually layered on top of the fused results to push precision even higher.
What Is Hybrid Search
Hybrid search runs two retrieval methods with very different characteristics, keyword search and vector search, at the same time and then fuses the result lists each produces into a single ranking. Weaviate defines it as a technique that combines multiple search algorithms to improve the accuracy and relevance of search results, specifically merging a keyword-based sparse vector and a meaning-based dense vector into one ranked list.
This approach is necessary because each method has clearly distinct strengths and weaknesses. A keyword (sparse) method like BM25 scores documents by term frequency and document length, so it is strong on queries that look for "exactly that word"—product codes, proper nouns, rare technical terms—but it misses matches whenever the wording changes even if the meaning is identical. Embedding-based vector (dense) search, by contrast, handles paraphrases, synonyms, and context well, yet it tends to underweight exact matches on keywords that appear infrequently. Hybrid search merges the two result sets so that each one fills the other's gap. That is why it has become a core technique for raising the quality of the retrieval stage in the RAG pipelines that underpin generative search engines such as ChatGPT, Perplexity, and Google AI Overviews.
Keyword Search vs. Vector Search vs. Hybrid Search
| Dimension | Keyword Search (BM25) | Vector Search (Embeddings) | Hybrid Search |
|---|---|---|---|
| Matching basis | Exact term match, term frequency | Semantic and contextual similarity | Run both in parallel, then fuse |
| Strong on | Product codes, proper nouns, technical terms | Synonyms, paraphrases, natural-language questions | Both exact matches and meaning |
| Weakness | Misses rewordings even when meaning is the same | Underweights exact matches on rare keywords | Added implementation and tuning complexity |
| Data representation | Inverted index | Vector index (ANN) | Sparse + dense used together |
| Combining results | Standalone score | Standalone score (cosine, etc.) | Fused via RRF or convex combination |
How to Merge the Two Result Sets: RRF (Reciprocal Rank Fusion)
The core difficulty is that the two methods use different scoring systems. BM25 scores and vector cosine similarity (typically -1 to 1) sit on different scales, so simply adding or averaging them lets one side dominate the results. The most widely used fix is RRF (Reciprocal Rank Fusion), which ignores the raw scores entirely and uses only each document's rank within each list. The final score is built by summing the reciprocals of the ranks a document received across the lists.
The official Azure AI Search documentation computes a document's RRF score as the sum of 1 / (rank + k) and explains that, experimentally, a small value of k around 60 works best. Elasticsearch uses the same formula, defaults the rank constant rank_constant to 60, and requires at least two retrievers before fusion can be applied. The pseudocode Elasticsearch specifies is as follows:
score = 0.0
for q in queries:
if d in result(q):
score += 1.0 / ( k + rank( result(q), d ) )
return scoreHere k is the rank constant and rank() is the document's rank starting from 1. This way, without having to normalize the score units of the two methods, documents that rank highly on both sides naturally earn a higher combined score. Weaviate also uses RRF as its default fusion and states the formula as ∑d∈D 1/(k + r(d)).
Weight Tuning and Other Fusion Methods
In Weaviate, the alpha parameter controls how much weight each method gets. An alpha of 0 is pure keyword search, 1 is pure vector search, and 0.5 weights them equally; the default is 0.75, leaning slightly toward the vector side.
RRF is not the only answer. Pinecone's study "An Analysis of Fusion Functions for Hybrid Retrieval" reports that a convex combination, which takes a weighted sum of the lexical and semantic scores, outperformed RRF in both in-domain and out-of-domain settings. It also finds that, contrary to conventional wisdom, RRF is sensitive to its parameters, whereas a convex combination is relatively insensitive to the score-normalization method and is sample-efficient—its single parameter can be tuned to a target domain from only a handful of training examples. In other words, RRF makes a good out-of-the-box default that needs no tuning, while weighted score fusion can deliver better results when adjusted to fit your data.
Implementation Checklist
- Examine your query types. If product codes, proper nouns, and exact matches matter, raise the keyword weight; if natural-language questions and paraphrases dominate, raise the vector weight.
- Build an inverted index (keyword) and a vector index over the same corpus and run both searches in parallel.
- Start with RRF as the default fusion and keep the constant k around 60 (the Elasticsearch and Azure default).
- If you use Weaviate, tune alpha within the 0.5–0.75 range by A/B testing against real query logs.
- Consider layering a reranking model on top of the top N fused results to push final precision higher.
- If you use weighted score fusion (convex combination), tune the normalization and weighting of the two scores against a validation set.