Query Decomposition
Query decomposition is a technique that breaks a single complex query into several simpler sub-questions that can each be answered independently. Because each sub-question is retrieved and reasoned over separately and the results are then combined into a final answer, it improves retrieval accuracy and answer quality on multi-hop questions whose evidence is scattered across multiple documents.
- Query decomposition splits a hard, compound query into several easier sub-questions.
- Each sub-question is retrieved and reasoned over on its own, then the results are merged to answer the original query, which is especially effective for multi-hop questions whose evidence is spread across multiple documents.
- Major RAG frameworks such as LangChain, NVIDIA, and Haystack ship it as a standard technique, and it applies through LLM prompting alone with no additional training.
- A 2025 arXiv study reports a +36.7% gain in retrieval (MRR@10) and a +11.6% gain in answer accuracy (F1) on the MultiHop-RAG and HotpotQA benchmarks.
- It sits next to query fan-out and query rewriting, but the essence of decomposition is splitting a complex query into sub-questions.
What Is Query Decomposition?
Query decomposition is a technique that divides a single complex query into several simpler sub-questions that can each be answered independently, processes each one separately, and combines the results into a final answer. The idea is to take a question that a single retrieval pass cannot gather enough evidence for and solve it by breaking it into units that are easy to answer.
What makes this technique matter is multi-hop questions. A multi-hop question is one whose supporting facts are not gathered in a single document but scattered across several. For instance, the question "Who earned more last year, Microsoft or Google?" is rarely answered by a single document that lists both companies' revenues side by side. Searching with the original wording tends to surface evidence for only one side, or miss both. Query decomposition splits this into two simple questions — "How much did Google earn last year?" and "How much did Microsoft earn last year?" — retrieves the right document for each, and then combines them for comparison. The LangChain documentation explains the same principle with a comparison question like "How do Web Voyager and reflection agents differ?" When documents exist that describe each one but none directly compares the two, retrieving "What is Web Voyager?" and "What are reflection agents?" separately and combining them works better than searching the original query.
How It Differs From Adjacent Concepts
Query decomposition is one of several "query transformation" techniques that reshape a query before retrieval. Because it has a different focus from the adjacent concepts it is often confused with, the distinctions are worth drawing out.
| Technique | Core action | Output |
|---|---|---|
| Query decomposition | Splits a complex query into easier-to-answer sub-questions | Multiple sub-questions, each asking about a different aspect |
| Query rewriting | Keeps the meaning intact while refining the phrasing for better retrieval | One improved single query |
| Query fan-out | Expands one query into several related variant or extension queries | Multiple related queries run in parallel |
| Step-back prompting | Abstracts a specific question into a higher-level conceptual one | One more general, higher-level question |
The key distinction is this. Query rewriting goes no further than turning the same question into a single query that retrieves better, and query fan-out focuses on "widening" one query into several related branches. Query decomposition, by contrast, is fundamentally about asking "what do we need to know first" to solve the original question and splitting it into semantically independent sub-questions. The LangChain blog likewise distinguishes rewriting as "generating a single improved query" from decomposition as "generating multiple retrieval queries that run in parallel."
How It Works
In a RAG pipeline, query decomposition generally proceeds in three stages — the flow that the NVIDIA RAG documentation and the arXiv research present in common.
- Generate sub-questions: The LLM takes the original query and decomposes it into several sub-questions that can be answered independently.
- Retrieve per sub-question: For each sub-question, passages are retrieved separately to gather evidence.
- Merge, rerank, and synthesize: The gathered candidate documents are combined and reranked to reduce noise, then a comprehensive answer is generated.
The decomposition structure comes with a trade-off. Solving sub-questions sequentially passes each step's result to the next and maximizes information flow, but it can create an "error cascade" in which early mistakes accumulate downstream. Solving them in parallel, by contrast, isolates errors between steps but cannot exploit the dependencies among them. So when a question is independent across parts — like a comparison — parallel decomposition is preferable, and when it is chained reasoning, sequential decomposition wins.
Evidence and Examples
The effectiveness of query decomposition is backed by academic research. Ammann, Golde, and Akbik's "Question Decomposition for Retrieval-Augmented Generation" (arXiv:2507.00355, 2025) evaluated the three-stage pipeline above (decompose → retrieve per sub-question → merge and rerank) on the MultiHop-RAG and HotpotQA benchmarks. The result, they report, is a +36.7% improvement in the retrieval metric MRR@10 and a +11.6% improvement in answer accuracy F1 over standard RAG. The authors describe the technique as a "drop-in" improvement that "can be applied immediately without additional training or special indexing" and that "closes the retrieval gap on multi-hop questions."
That decomposition applies without separate training has been emphasized since the early research. Perez et al.'s "Unsupervised Question Decomposition for Question Answering" (arXiv:2002.09758) proposed an approach (ONUS) that maps a single hard multi-hop question into several simple single-hop sub-questions in an unsupervised manner, showing that decomposition can raise performance on complex question answering. On the practical side, the NVIDIA RAG documentation classifies it as a technique suited to "multi-step, compound queries" and offers an example implemented in three stages: sub-question generation → iterative processing → response synthesis.
Implementation Checklist
- First determine whether the question is multi-hop (evidence scattered across documents), comparative, or multi-step reasoning to single out what should be decomposed. Simple single-fact questions do not need decomposition.
- In the LLM decomposition prompt, include the instruction to "split into sub-questions that can be answered independently," along with a cap on the number of sub-questions to prevent over-splitting.
- Handle comparative questions with parallel decomposition and chained-reasoning questions with sequential decomposition, balancing the error cascade against the loss of dependencies.
- After merging the per-sub-question retrieval results, run reranking to remove duplicates and noise.
- Measure retrieval metrics (MRR, Recall) and answer accuracy (F1, EM) both before and after decomposition to verify that it actually improves things.
References
- Ammann, Golde, Akbik — Question Decomposition for Retrieval-Augmented Generation (arXiv:2507.00355, 2025)
- Perez et al. — Unsupervised Question Decomposition for Question Answering (arXiv:2002.09758)
- NVIDIA — Query Decomposition for NVIDIA RAG Blueprint
- LangChain Blog — Query Transformations
- Haystack (deepset) — Advanced RAG: Query Decomposition & Reasoning