Prompt Engineering
Prompt engineering is the practice of designing the instructions, examples, format, and role within an input prompt so that a large language model (LLM) reliably produces the output you want. The goal is to improve response quality and consistency through the input text alone, without changing the model's weights.
- Prompt engineering is the practice of designing the prompt itself—its instructions, examples, format, and role—to steer an LLM's output in the direction you want.
- Because it changes only the input text rather than retraining the model, it is far cheaper and faster to iterate on than fine-tuning.
- Core techniques include zero-shot and few-shot prompting, chain-of-thought (CoT), role assignment, output formatting, and task decomposition (prompt chaining).
- OpenAI, Anthropic, and Google all emphasize the same principles: give clear and specific instructions, provide enough context, supply good examples, and refine iteratively.
- It sits next to context engineering, but prompt engineering focuses specifically on the composition and wording of the prompt text sent to the model.
What Prompt Engineering Is
Prompt engineering is the practice of deliberately designing the input prompt so that a large language model (LLM) produces the desired output reliably. Even for the same task, accuracy and consistency can vary dramatically depending on how you phrase the instructions, which examples you show, and what role and output format you assign. Unlike fine-tuning, which alters the model's weights, prompt engineering leaves the model untouched and adjusts its behavior through the input text alone. That makes it nearly free and lets you iterate on edits and validation quickly.
As generative models like ChatGPT, Claude, and Gemini increasingly power search, content, and customer support, the ability to design prompts has become a decisive factor in output quality. OpenAI, Anthropic, and Google each publish official prompting guides for their own models, and while the wording differs, their recommended principles overlap substantially.
How It Differs from Context Engineering
Prompt engineering is often discussed alongside context engineering, but the two have different focuses. Prompt engineering concentrates on the composition and wording of the prompt text itself—how you craft the instruction phrasing, examples, role, and output format. Context engineering, by contrast, is the broader task of designing what information to "load in" for the model—search results, documents, conversation history, tool outputs—and in what order and volume. This article focuses on the design of the prompt itself.
Key Techniques
The following are the representative prompt engineering techniques covered consistently across the official guides from OpenAI, Anthropic, and Google, as well as in academic research.
| Technique | Description | When to use |
|---|---|---|
| Zero-shot prompting | Request a response from the task instruction alone, with no examples | Simple, general tasks the model already handles well |
| Few-shot prompting | Provide 2–5 input/output examples so the model learns the format and pattern | Tasks where a specific output format, tone, or structure matters |
| Chain-of-Thought (CoT) | Elicit intermediate reasoning steps, e.g. "think step by step" | Problems requiring arithmetic, logic, or multi-step reasoning |
| Role / Persona assignment | Fix the tone and expertise by assigning a role, e.g. "You are an experienced tax accountant" | When domain expertise or a consistent voice is needed |
| Output formatting | Specify the output structure—table, JSON, list, length limit | When a downstream system parses the output or a fixed format is required |
| Task decomposition / prompt chaining | Break a complex task into multiple steps and call them sequentially | Compound tasks that are hard to handle in a single pass |
| Delimiters / structuring (XML, Markdown) | Separate instructions, context, and examples with tags or headings | When the prompt is long and has many components |
Example: Zero-shot vs. Few-shot vs. Chain-of-Thought
# Zero-shot: instruction only
Classify the sentiment of the following review as positive, negative, or neutral.
Review: "Delivery was fast, but the packaging was a mess."
# Few-shot: fix the output format with examples
Review: "Great value for the price" -> positive
Review: "Never buying this again" -> negative
Review: "Delivery was fast, but the packaging was a mess" ->
# Chain-of-Thought (CoT): elicit intermediate reasoning
Solve the following problem step by step.
If there are 23 apples, 20 are used at lunch, and 6 more are bought, how many are left?
Work through it step by step, then give the answer on the last line in the format "Answer: N".
Evidence and Cases
Chain-of-Thought has been academically shown to be effective and is a flagship technique. Wei et al. (2022), "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" (arXiv:2201.11903) demonstrated that prompting a model to generate intermediate reasoning steps instead of jumping straight to the final answer substantially improves performance across arithmetic, commonsense, and symbolic reasoning tasks. Notably, when a 540B-parameter model was given just 8 chain-of-thought exemplars, it surpassed even a fine-tuned GPT-3 with a verifier on the GSM8K math word-problem benchmark, achieving state-of-the-art (SOTA) at the time. This was a turning-point study, showing that simply changing how a prompt is constructed can unlock a model's reasoning ability.
The recommendations in the official guides are consistent. Google's Gemini prompt design documentation states that you should "always include few-shot examples in your prompts," warning that a prompt without examples is likely to be less effective, and it stresses keeping the structure and format of all examples identical. Anthropic's Claude prompt engineering documentation organizes its core techniques around clarity, examples (multishot), structuring with XML tags, role assignment, thinking, and prompt chaining. OpenAI advises placing global guidance such as tone and role in the system message, with task-specific instructions and examples in the user message, and recommends setting temperature to 0 for accuracy-critical tasks like data extraction or factual responses. All three treat iterative refinement—drafting a prompt and then polishing it while observing the output—as essential.
Execution Checklist
- Confirm that the task, the target audience, and the definition of "done" are stated explicitly within the prompt.
- Specify the output format concretely (table, JSON, list, length).
- For tasks where format or tone matters, provide 2–5 input/output examples in an identical structure.
- For multi-step reasoning, add a chain-of-thought instruction such as "think step by step."
- When expertise or a particular tone is needed, assign a role (persona).
- When instructions, context, and examples get long and mixed together, separate them with XML tags or Markdown headings.
- Break tasks that are hard to handle in one pass into step-by-step prompts (chaining).
- When factual accuracy matters, set a low temperature and refine the prompt iteratively while watching the output.