RAG (Retrieval-Augmented Generation)

Instead of relying on what the AI memorized during training, RAG lets it look things up first.

The simple explanation

Imagine taking a test. Without RAG, you're working from memory - you answer based on what you learned during studying (training). With RAG, you get to bring your notes to the exam. Before answering each question, you flip to the relevant page and use that information to craft a better response.

RAG works in two steps. First, a retrieval system finds relevant documents, code snippets, or data based on the current question. Then, the generation step feeds those retrieved results into the AI model alongside the question, so the model can use up-to-date, specific information rather than relying solely on its training data.

For code, this means when you ask an agent about your project's authentication system, a RAG pipeline can automatically find and retrieve the relevant auth files, middleware, and configuration - giving the agent accurate, project-specific context instead of generic knowledge.

Why it matters for agentic engineering

Coding agents need to understand your codebase, not just code in general. RAG bridges this gap. Without it, agents rely on whatever fits in the context window. With RAG, agents can work with codebases that are far larger than any context window by selectively pulling in exactly the files and documentation they need.

This is especially important for large projects where only a tiny fraction of the codebase is relevant to any given task. RAG helps the agent find the needle in the haystack without loading the entire haystack into memory.

In practice

Code search: When an agent needs to modify a function, RAG finds all the places that function is called, the tests that cover it, and related documentation
Embeddings-based search: Code is converted to vector embeddings, enabling semantic search - find code by meaning, not just keyword matching
Documentation retrieval: Pull in relevant API docs, architecture decision records, or internal wiki pages based on the current task
Hybrid approaches: Combining keyword search (grep), semantic search (embeddings), and structural search (AST queries) for the best results
Freshness: Unlike training data, RAG retrieves current information. If someone refactored the auth module yesterday, RAG finds the new version

← Back to Glossary

RAG (Retrieval-Augmented Generation)

The simple explanation

Why it matters for agentic engineering

In practice

Related Terms