The units AI models think in - roughly ¾ of a word each.
Tokens are the basic units that AI models use to process text. They're not exactly words - they're more like syllable-sized chunks. The word "engineering" might be two tokens ("engine" + "ering"), while common short words like "the" are one token each. Code has its own tokenization patterns - a variable name like getUserById might be three or four tokens.
Think of tokens like the atoms of AI language. Everything the model reads (input tokens) and everything it writes (output tokens) is measured in tokens. When someone says a model has a "200K context window," they mean it can process 200,000 tokens at once.
Tokens directly translate to two things you care about: cost and capability.
Cost: API pricing is per-token. Input tokens (what you send to the model) are cheaper than output tokens (what the model generates). A coding agent that reads 50 files and generates a large implementation can burn through millions of tokens in a session. Understanding token economics helps you design cost-efficient workflows - like using smaller models for simple tasks and reserving large models for complex reasoning.
Capability: The token budget determines how much the agent can see and do in one turn. More tokens in the context means the agent can reason over more code at once. But there's a quality tradeoff - very long contexts can degrade attention and accuracy.