GEO & AI Search

Tool Use

Tool use is an LLM's ability to call external tools such as search, calculators, code execution, and APIs to extend itself beyond its own limits. It is the core behavior behind AI agents, and in practice it is implemented through the function calling mechanism.

Tool use is the umbrella concept of an LLM extending its capabilities by drawing on external tools like search, calculators, code execution, and APIs, and it is the defining behavior of an AI agent that acts autonomously.
Function calling is the mechanism that actually implements tool use: the model emits a function's name and arguments in a structured format, and the application runs it.
In Anthropic's documentation, "tool use" and "function calling" are treated as synonyms, and the model decides on its own when to use a tool and which one to call by reading the tool descriptions.
The ReAct paper (2022) showed empirically that interleaving reasoning with actions reduces hallucination and pulls in external information to improve answer accuracy.
From a GEO standpoint, AI search engines assemble answers by fetching real-time information with web search and page-fetch tools, so the content those tools reach becomes the pool of citation candidates.

What tool use means

Tool use is when a large language model (LLM) answers not from its internally trained knowledge alone, but by calling tools such as search engines, calculators, code execution environments, and external APIs to extend what it can do. On its own, an LLM is weak at retrieving up-to-date information, performing exact arithmetic, and accessing real-time data; attaching tools closes those gaps and turns the model into an actor that "interacts with the world." For this reason, tool use is regarded as the core operating principle of an AI agent that reasons and acts on its own.

One distinction matters here. Tool use is the higher-level idea of "extending capability by using tools," while function calling is the mechanism that actually implements it. Anthropic's official documentation states plainly that "tool use is also known as function calling" and uses the two terms interchangeably. In other words, the big picture of putting tools in the model's hands is tool use, and the wiring that expresses each tool as a "function call with a name and arguments" and runs it is function calling.

How it works

In modern LLMs, tool use generally follows the flow below. Framed around Anthropic Claude's tool use procedure, it looks like this.

Define the tools: The developer passes each tool's name, description, and input schema (JSON Schema) to the model. The model reads these descriptions to judge when each tool is appropriate.
Signal a call: When a user request matches a tool's capability and the answer is not already in context, the model does not finish its response as plain text. Instead it emits a signal that it intends to call a tool (stop_reason: "tool_use") along with a structured block containing the name and arguments of the function to call.
Execute: The application actually runs that function (performing the search, calling the API, executing the code, and so on).
Return the result: The execution result (tool_result) is handed back to the model, which incorporates it and composes the final answer.

The key point is that the model never executes external code directly. The model only emits, in structured form, its intent to "call this function with these arguments"; the actual execution and retrieval of results are handled by the application (or the provider's infrastructure). Below is an example of the function call block the model emits.

{
  "type": "tool_use",
  "id": "toolu_01A09q90qw90lq917835lq9",
  "name": "get_weather",
  "input": { "location": "Seoul, KR", "unit": "celsius" }
}

Tools fall into two kinds depending on where they run. Client tools run inside the developer's own application (user-defined functions, bash, text_editor, and the like): when the model emits a tool_use block, the developer's code executes it and returns a tool_result. Server tools run on the provider's infrastructure (for example web_search, code_execution, web_fetch, and tool_search): the developer never handles execution and simply receives the results. In addition, under the default tool_choice: auto, the model decides for itself on each turn whether to call a tool or answer directly, and settings such as any or tool can force a tool call.

Relationship to function calling (umbrella concept vs. implementation)

The two are adjacent but sit at different levels. To avoid confusion, here is how they line up.

Aspect	Tool Use	Function Calling
Level	Umbrella concept and capability	Implementation mechanism
Core question	"What can the model do with external tools?"	"How is that tool called and executed?"
Scope	Everything from search, computation, and code execution to APIs and interaction with external environments	Expressed as a function call format with a name and arguments
Output	A final answer or action extended by using tools	A structured call block (name + arguments JSON)
Relationship	Function calling is the standard wiring that realizes tool use — Anthropic's documentation uses the two as synonyms

Evidence and case studies

ReAct (Reasoning + Acting). Proposed by Yao et al. (arXiv:2210.03629, submitted October 2022, ICLR 2023), ReAct is a paradigm that generates reasoning traces and actions in an interleaved manner. By letting the model think and, at the same time, fetch information through external tools, it reduces the hallucination and error propagation that are common with chain-of-thought reasoning alone. In practice, on the HotpotQA and Fever tasks it interacted with a simple Wikipedia API to mitigate hallucination, and on decision-making benchmarks it reported absolute success-rate gains of +34 points on ALFWorld and +10 points on WebShop over imitation-learning and reinforcement-learning baselines. This shows that "the act of calling a tool" is not mere assistance but a core factor that raises answer accuracy.

Toolformer. Toolformer, from Schick et al. (arXiv:2302.04761, Meta AI, February 2023), demonstrated that a model can learn on its own which API to call, when, with what arguments, and how to fold the result back into the next-token prediction. It trains in a self-supervised way from only a handful of demonstrations per API, and by integrating tools such as a calculator, a question-answering system, a search engine, a translator, and a calendar, it achieved performance rivaling far larger models on tasks like arithmetic and fact lookup. The work opened up the possibility that a model can learn tool use itself.

Practical value. Anthropic's documentation describes tool access as one of the "highest-leverage primitives" you can give an agent. On benchmarks such as scientific figure interpretation (LAB-Bench FigQA) and real-world software engineering (SWE-bench), it explains that simply attaching basic tools substantially improves capability and at times even surpasses human-expert baselines.

What it means for GEO and AI search

Generative search engines and AI answer features often work internally by calling web search tools and page-fetch tools to pull in real-time information before composing an answer. In other words, "the web content those tools reach" becomes the candidate set of citations and evidence for the AI's answer. Structuring your content so that search and fetch tools can read and cite it well, with clear sourcing, therefore connects directly to a GEO strategy aimed at being included in AI answers.