The AI's working memory - how much code and conversation it can "see" at once.
Imagine reading a book, but you can only keep a certain number of pages in front of you at any time. Once you exceed that limit, the earliest pages get pushed off the table. That's essentially what a context window is for an AI model.
The context window is measured in tokens (roughly ¾ of a word each). A model with a 200K token context window can process around 150,000 words at once - enough for a significant chunk of a codebase. But for large projects, it's still a constraint. You can't dump your entire monorepo into the window and expect good results.
Context windows have grown dramatically - from 4K tokens in early GPT models to 200K+ in current models like Claude. But bigger isn't always better. Models tend to pay less attention to information in the middle of very long contexts (sometimes called "lost in the middle"), so strategic context management still matters.
Context window size directly affects what an agent can accomplish. A small context window means the agent can only work with a few files at a time - it might make changes that conflict with code it can't see. A larger context window lets the agent understand more of the system at once, leading to more coherent changes.
This is why good agentic engineering practices like keeping files focused, writing clear AGENTS.md documentation, and using spec-driven development are so valuable. They help the agent get maximum value from its limited context instead of wasting tokens on irrelevant information.