Back to Glossary
GEO & AI Search

Context Engineering

Context engineering is the discipline of designing and managing the context an LLM receives (instructions, retrieval results, tools, memory, and conversation history) to elicit the best possible output. It is a broader concept that subsumes prompt engineering's focus on wording, addressing what to place in context, in what format, and when.

  • Context engineering is the discipline of designing and managing the tokens that fill an LLM's context window at inference time (instructions, retrieval results, tools, memory, and conversation history).
  • Prompt engineering, the craft of refining the prompt wording itself, is a subset of context engineering, since the prompt is only one part of the full context.
  • Anthropic defines it as "the set of strategies for curating and maintaining the optimal set of tokens during inference" (September 2025).
  • LangChain frames four strategies for handling context: write, select, compress, and isolate.
  • If the context window is the "capacity (RAM)," context engineering focuses on the practice of designing and managing what goes into that limited space.

Overview

Context engineering is the discipline of designing and managing the entire context an LLM is given when it performs a task, namely the system instructions, retrieval results, tool definitions, long-term memory, and conversation history, in order to draw out the best output. Because the same model can produce dramatically different results depending on what information enters the context window and in what format, context construction shapes output quality as much as model selection does.

Phil Schmid of Google DeepMind defines context engineering as "the discipline of designing and building dynamic systems that provide the right information and tools, in the right format, at the right time, to give an LLM everything it needs to accomplish a task" (June 2025). He stresses that context should be treated as "a system, not a string," and explains that agent failures usually stem not from the model's limits but from the quality of the context.

This concept becomes especially important in AI agent settings, where tools are invoked and memory accumulates across many steps rather than in a single round of question and answer. The longer a task runs, the more tool results and intermediate outputs pile up, making it easy to exceed the context window's limit, drive up cost and latency, or degrade performance.

Prompt Engineering vs. Context Engineering

The two concepts are not opposites but stand in a containment relationship. Anthropic distinguishes prompt engineering as "methods for writing and organizing LLM instructions for optimal results" from context engineering as "the set of strategies for curating and maintaining the optimal set of tokens during inference." In other words, prompt engineering is a subset of context engineering, and while a well-written prompt still matters, in production agents it is only one part of the full context.

AspectPrompt EngineeringContext Engineering
What it handlesThe text sent to the model (instructions and questions)The entire context window (instructions, tools, memory, retrieval results, history)
Core questionWhat wording and how to instructWhat to include, in what format, and when
NatureWriting a static stringDesigning and managing a dynamic system
Primary arenaSingle-shot question and answerMulti-step agents, long-running tasks
RelationshipA subset of context engineeringThe broader concept that encompasses prompt engineering

The Components of Context

Phil Schmid identifies seven components that make up context. Context engineering is the work of selecting and arranging these components.

  • Instructions / system prompt — the initial instructions that set the agent's rules of behavior and examples
  • User prompt — the immediate question or task to handle
  • State / history (short-term memory) — the current conversation exchanged up to this point
  • Long-term memory — knowledge accumulated from past conversations and learned preferences
  • Retrieved information (RAG) — fresh external knowledge pulled from documents, databases, or APIs
  • Available tools — the function definitions the model can call
  • Structured output — response format specifications such as a JSON schema

Core Strategies and Evidence

In a July 2025 blog post, LangChain analyzed a range of agents and papers to distill four common strategies for context engineering. The piece cites Andrej Karpathy's analogy, which likens an LLM to "a new kind of operating system," explaining that "the LLM is the CPU and the context window is the RAM that serves as its working memory." Just as RAM capacity is limited, the crux is deciding what information to load into that limited space.

  • Write — use a scratchpad or memory to store information outside the context window and write it back later.
  • Select — pull only the needed information into the context window through memory, tools, RAG, and the like.
  • Compress — use summarization and trimming to keep "only the tokens needed to perform the task."
  • Isolate — separate context across multiple agents, sandboxes, or state objects and manage each independently.

Anthropic likewise offers concrete techniques for long-running tasks in its September 2025 engineering post. Compaction summarizes the contents of the context and then restarts a fresh context window from that summary, while structured note-taking has the agent leave notes outside the context window and recall them later, maintaining persistent memory with minimal overhead. Beyond these, it recommends a pattern in which sub-agents with clean context handle focused work and return only a compressed summary, along with just-in-time retrieval, which keeps lightweight identifiers rather than loading all data up front and fetches the needed data at execution time.

Execution Checklist

  • Write system prompts in clear, direct language at the right level of detail, neither too specific nor too vague.
  • Curate tools to a minimum, include in tool results only the metadata the model needs to decide its next action, and truncate overly long results.
  • On long tasks, do not let conversation history accumulate indefinitely; use summarization and compression to keep "only the tokens needed."
  • Manage memory by separating state data into structured formats such as JSON and progress notes into unstructured text.
  • Rather than injecting all information up front, keep only identifiers and fetch via just-in-time retrieval at the moment of need.
  • When a task is too much for a single context to handle, split it off to sub-agents and recover only the summary.

References