Instead of relying on what the AI memorized during training, RAG lets it look things up first.
Imagine taking a test. Without RAG, you're working from memory - you answer based on what you learned during studying (training). With RAG, you get to bring your notes to the exam. Before answering each question, you flip to the relevant page and use that information to craft a better response.
RAG works in two steps. First, a retrieval system finds relevant documents, code snippets, or data based on the current question. Then, the generation step feeds those retrieved results into the AI model alongside the question, so the model can use up-to-date, specific information rather than relying solely on its training data.
For code, this means when you ask an agent about your project's authentication system, a RAG pipeline can automatically find and retrieve the relevant auth files, middleware, and configuration - giving the agent accurate, project-specific context instead of generic knowledge.
Coding agents need to understand your codebase, not just code in general. RAG bridges this gap. Without it, agents rely on whatever fits in the context window. With RAG, agents can work with codebases that are far larger than any context window by selectively pulling in exactly the files and documentation they need.
This is especially important for large projects where only a tiny fraction of the codebase is relevant to any given task. RAG helps the agent find the needle in the haystack without loading the entire haystack into memory.