Semantic Chunking
Semantic chunking splits a document along meaning boundaries rather than fixed units such as character count. It is a RAG chunking strategy that locates points where the embedding similarity between adjacent sentences drops sharply and breaks there, yielding chunks that are semantically cohesive.
- Semantic chunking is a chunking strategy that divides a document by meaning boundaries instead of character count, breaking at the points where the embedding similarity between adjacent sentences falls off sharply.
- Chunking itself is the broader concept; semantic chunking is the meaning-based variant that decides where to split by measuring embedding similarity.
- LangChain's SemanticChunker and LlamaIndex's SemanticSplitterNodeParser are the canonical implementations, setting split points with thresholds such as percentile, standard deviation, or interquartile range.
- Semantically cohesive chunks have the potential to improve retrieval accuracy, but they add the cost of extra embedding calls.
- A 2024 evaluation study (arXiv:2410.13070) concluded that the gains from semantic chunking are inconsistent and often fail to justify its additional computational cost.
What Semantic Chunking Is
Semantic chunking divides a document along its meaning boundaries. It embeds sentences (or smaller units) one after another, measures the embedding similarity between adjacent pieces (usually as cosine distance), and treats the points where similarity drops abruptly—where the distance is large—as split points. The result is that semantically related sentences cluster within a single chunk, and boundaries form wherever the topic shifts.
The premise here is the broader notion of chunking. A RAG pipeline cannot embed or retrieve a long document as a whole, so it breaks the text into small units, and the heart of any chunking strategy is the question of where to cut. The simplest approach is fixed-size chunking, which slices mechanically at a set number of characters or tokens. Semantic chunking swaps that criterion of length for a criterion of meaning change. In other words, semantic chunking is one kind of chunking, distinguished by the fact that it decides split points from embedding similarity.
This matters because chunk boundaries govern retrieval quality. When a single unit of meaning is split across two chunks, context is severed at retrieval time; when unrelated material is mixed into one chunk, the embedding signal is diluted. Semantic chunking is an attempt to align boundaries with meaning so that each chunk is highly cohesive.
Fixed-Size Chunking vs. Semantic Chunking
| Aspect | Fixed-Size Chunking | Semantic Chunking |
|---|---|---|
| Split criterion | A set length such as character or token count (often with overlap) | Change in embedding similarity between adjacent sentences (meaning boundary) |
| Boundary location | Mechanical cut at fixed positions | Points where similarity drops sharply; topic transitions |
| Chunk length | Uniform | Variable (depends on content) |
| Semantic cohesion | Boundaries may cut through the middle of a sentence or topic | More likely to group closely related sentences together |
| Computational cost | Low (no embeddings required) | High (extra embedding calls for splitting) |
| Representative implementation | RecursiveCharacterTextSplitter, etc. | LangChain SemanticChunker, LlamaIndex SemanticSplitterNodeParser |
How It Works and Threshold Types
Semantic chunking sets its split points through a threshold mechanism. The arXiv:2410.13070 paper and the LangChain documentation describe the following relative thresholds.
- Percentile: split at points that exceed the nth percentile of the adjacent-distance distribution. The 95th percentile is common, and lowering the value produces more frequent splits.
- Standard deviation: split at distances that fall a given multiple of the standard deviation away from the mean.
- Interquartile: split at distances that qualify as outliers relative to the interquartile range (IQR).
- Gradient: apply a percentile to the second-order difference (rate of change) of the distances to locate boundaries.
LangChain's SemanticChunker supports all four breakpoint_threshold_type options, with percentile as the default. LlamaIndex's SemanticSplitterNodeParser exposes buffer_size (the number of sentences grouped together when evaluating similarity) and breakpoint_percentile_threshold as parameters, using values of 1 and 95 respectively in the official example. Raising the percentile threshold makes the splitter break only at major topic transitions, yielding more conservative (longer) chunks.
# LlamaIndex example
from llama_index.core.node_parser import SemanticSplitterNodeParser
from llama_index.embeddings.openai import OpenAIEmbedding
splitter = SemanticSplitterNodeParser(
buffer_size=1,
breakpoint_percentile_threshold=95,
embed_model=OpenAIEmbedding(),
)
nodes = splitter.get_nodes_from_documents(documents)Evidence and Cases
Semantic chunking is intuitively appealing, but its effectiveness varies with the data and the situation. The 2024 paper "Is Semantic Chunking Worth the Computational Cost?" (Qu, Tu, Bao, arXiv:2410.13070), from researchers at Vectara and the University of Wisconsin–Madison, ran a large-scale comparison of semantic chunking against fixed-size chunking across three tasks: document retrieval, evidence retrieval, and answer generation. It used 10 datasets for document retrieval and 5 RAGBench-based datasets for evidence retrieval. The conclusion was that the gains from semantic chunking are "inconsistent and often fail to justify its additional computational cost," and that on non-synthetic documents reflecting real document structure in particular, fixed-size chunking remained the more efficient and reliable choice.
In the same vein, several follow-up evaluations report that semantic chunking is not always superior. In some enterprise document retrieval settings, naive chunking paired with a particular embedding delivered the highest performance—an outcome that contradicts the common assumption that semantic chunking is inherently better. Adopting semantic chunking is therefore safest as a decision made only after validating its cost-effectiveness against your actual data.