LLM
A large language model (LLM) is an AI model pre-trained on vast amounts of text that understands and generates natural language by predicting the next word (token) probabilistically. LLMs are the engines behind services like ChatGPT, Gemini, and Claude, and behind Google's AI Overviews, sitting at the center of search's shift from a list of links to AI-synthesized answers.
- An LLM generates text by predicting the next token after learning from internet-scale data, and it is the engine that produces the answers in ChatGPT, Gemini, Claude, and Google AI Overviews.
- The 2017 Transformer architecture made large-scale parallel training possible and now underpins every major LLM.
- Because an LLM does not inherently know recent or specialized information absent from its training data, it is often paired with retrieval-augmented generation (RAG), which pulls in search results and external documents to answer.
- As search shifts toward LLM-generated answers, being cited or mentioned inside an LLM's response — not just ranking in results — has become a new visibility metric (GEO).
- Google's own documentation states that existing SEO best practices still apply to AI Overviews and AI Mode, so crawlability, structured data, and trust signals are the foundation of LLM visibility.
What is an LLM?
A large language model (LLM) is an AI model pre-trained on vast amounts of text — books, web pages, code — that probabilistically predicts the next word (more precisely, the next "token") given the preceding context. Repeating this simple "next-token prediction" across astronomical volumes of data and parameters gives rise, emergently, to the ability to perform diverse tasks such as translation, summarization, question answering, and writing without being explicitly designed for any of them. ChatGPT, Google Gemini, and Anthropic Claude are all services that run on top of such LLMs.
The starting point for the modern LLM is a 2017 paper from Google researchers, "Attention Is All You Need" (Vaswani et al., arXiv:1706.03762). It discarded recurrent neural networks (RNNs) and proposed the Transformer architecture, which processes context using only self-attention. Because the Transformer trains in parallel rather than sequentially over a sentence, it opened the path to scaling models and data far larger than before. Today, nearly every major LLM is built on the Transformer.
The next inflection point was OpenAI's 2020 paper "Language Models are Few-Shot Learners" (Brown et al., arXiv:2005.14165) — the GPT-3 paper. It showed that a model scaled to 175 billion parameters could perform tasks like translation and question answering with no task-specific fine-tuning, simply by being shown a few examples in the prompt. The scaling paradigm — "bigger models are more capable" — and prompt-based usage went mainstream here, becoming the direct foundation for the generative AI boom that followed.
How an LLM works (a summary for marketers)
Stripped of the math, an LLM operates in the following flow.
- Pre-training: The model reads internet-scale text and endlessly guesses "which word is most likely to come next in this context." Through this process, the patterns, facts, styles, and reasoning of language are compressed into the model's parameters (weights).
- Inference: When a user enters a prompt, the model generates its answer by appending one token at a time according to the probability distribution it learned. In other words, its default behavior is not to "look up and retrieve" an answer but to produce "the most plausible next words."
This design brings two limitations. First, the model does not inherently know information that postdates its training or specialized knowledge absent from its training data (the knowledge cutoff). Second, it can confidently fabricate content that is not true — a hallucination. To address this, retrieval-augmented generation (RAG) — which pulls in search results and external documents in real time as the basis for an answer — is widely used. AI search experiences such as Google AI Overviews and Perplexity, which "answer by referencing the web," are exactly this combination in practice.
Traditional search engine vs. LLM-based answers
As LLMs entered search, the very way information reaches users changed. The differences can be summarized as follows.
| Dimension | Traditional search engine | LLM-based answer (AI search) |
|---|---|---|
| Result format | A list of links (ten blue links) | A synthesized natural-language answer plus reference links |
| User behavior | Clicking through and comparing multiple pages | Reading the answer directly (a rising share of zero-click) |
| Unit of exposure | Page-level ranking | Sentence- and paragraph-level citation or mention |
| Optimization goal | Climbing the search rankings (SEO) | Getting cited or recommended in AI answers (GEO) |
| Representative examples | Google standard search, Naver integrated search | Google AI Overviews and AI Mode, ChatGPT, Perplexity |
Why LLMs matter for SEO and GEO
As the results surface shifts toward "answers generated by an LLM," the marketer's objective has expanded too. The core task is now not only ranking near the top of search results but getting an LLM to cite or mention your content as a basis when it generates an answer. The discipline that addresses this is GEO (generative engine optimization).
This shift is also documented academically. "GEO: Generative Engine Optimization" (Aggarwal et al., arXiv:2311.09735), accepted to KDD 2024, states that "the advent of large language models (LLMs) has opened a new search paradigm in which generative models gather, summarize, and answer with information," and frames the "visibility inside AI answers" — which content creators struggle to control — as the optimization target. The paper experimentally showed that strategies such as adding source citations, statistics, and quotations can raise visibility within generative-engine responses by up to 40%.
That said, LLM exposure is not a separate game wholly divorced from SEO. Google's official documentation (Search Central, "AI features and your website") states that existing SEO best practices remain valid with no extra requirements for AI features such as AI Overviews and AI Mode. It also explains that these AI features present links to the source websites so users can explore further. In other words, because Google's AI uses the same crawling, indexing, and structured-data infrastructure when it builds an answer, a site that bots can read well and content with a clear structure and trust signals form the foundation for LLM citation.
Practical checklist: getting cited in LLM answers
- Ensure rendering and crawlability so AI crawlers can read the body of your content (audit JavaScript-dependent content).
- State the answer to the question up front in one or two clear sentences, making it easy for an LLM to excerpt and cite.
- Back your claims with sources, statistics, and quotations to strengthen trust signals (a strategy the GEO paper validated as effective).
- Use structured data and a clear heading hierarchy so machines can grasp the meaning of your content.
- Cover a single topic in depth so you become an authoritative source that is cited repeatedly in that area.
References
- Vaswani et al., "Attention Is All You Need", arXiv:1706.03762 (2017) — the Transformer architecture
- Brown et al., "Language Models are Few-Shot Learners" (GPT-3), arXiv:2005.14165 (2020) — 175-billion-parameter scaling and few-shot learning
- Aggarwal et al., "GEO: Generative Engine Optimization", arXiv:2311.09735 (KDD 2024) — visibility up to 40% higher
- Google Search Central — AI features and your website (AI Overviews, AI Mode, and SEO)