Your parallel Agent limit

April 7, 2026

Tip: Figure out your personal ceiling for running multiple agents in parallel.

More agents running doesn’t mean more of you available. The narrative around agentic engineering right now is still mostly about throughput and parallelism, but almost nobody’s talking about what it actually costs the human in the loop. You’re holding multiple problem contexts in your head at once, making judgment calls continuously, and absorbing the anxiety of not knowing what any one agent might be quietly getting wrong.

That’s a new kind of cognitive labor we don’t have good language for yet.

This builds on something Simon Willison said in a recent conversation with Lenny Rachitsky: fire up four agents in parallel, work until 11am, and you’re wiped out for the day. He’s right. We have a century of intuitions about compute scaling. We have almost nothing for what it feels like to hold five concurrent problem contexts in your head while making continuous judgment calls across all of them. It creeps up until you realize you’re exhausted by noon and can’t tell which thread got away from you.

I’ve been writing about the management model for orchestrating agents - treating parallel sessions the way you’d run an async engineering team, with briefs, delegation patterns, and verification loops. But that framing mostly addresses the organizational overhead. What I want to get into here is the personal overhead. The part that lives entirely inside you.

What parallel agents actually ask of you

When you pair with a single agent, the cognitive model is relatively clean. One active context, one thread of decisions, one stream of output to evaluate. You can go deep. You can notice drift.

Run four simultaneously and you’re doing something qualitatively different. You’re not doing four instances of the same thing - you’re managing four distinct mental models at once. Four different codebases, four different problem framings, four different trust calibrations, four different approximations of “how wrong could this quietly be going?”

That last one is the part we underestimate most. A lot of what makes parallel agentic work tiring isn’t active cognition. It’s background vigilance. The part of your mind that can’t fully relax because it knows something might be silently going sideways in a thread you haven’t checked in twenty minutes. I think of this as the ambient anxiety tax. It doesn’t show up in your task list, but it’s drawing from the same reservoir as everything else.

This connects to something I wrote about with comprehension debt: when we let agents generate faster than we can understand, debt accumulates. In a single session, that debt is bounded. In a parallel session, it compounds across threads simultaneously, and you can’t always tell which thread is running up the biggest tab.

Where the load actually lives

There’s the context switching, which is more expensive than it looks. Moving between agents means reloading a mental model: where this task started, what approach the agent settled on, what decisions you made thirty minutes ago that now constrain what’s acceptable. Research on task switching consistently shows the cost isn’t in the switch itself - it’s in the recovery time. With four parallel threads, you’re paying that tax constantly, and the recoveries never fully complete before the next switch.

There’s the judgment calls, which can’t be batched or deferred. This one surprised me when I started pushing on higher agent counts. The expectation was that parallel agents would free up my bandwidth by doing work in the background, and they do - but they also generate a continuous stream of judgment-requiring moments that interrupt. Does this approach match our architecture principles? Should I stop this thread and redirect? Is this error recoverable or a sign of fundamental drift? These can’t queue up like email.

And there’s the trust calibration overhead, which is invisible until it isn’t. Every agent session carries a dynamic, implicit calibration. Early on, I’m watching closely. Midway through a session going well, I give more rope. But that calibration has to be maintained separately for each thread - and it degrades when I’m not watching. The moment I realize I’ve lost track of where a thread’s trust sits, I’m back to re-reading everything it produced. That’s often the thing that turns a productive morning into an exhausting one.

Why adding agents doesn’t scale linearly

There’s a seductive math to running agents in parallel. If one agent can ship a feature in forty minutes and you can run four, you should have four features done in the same window. That math works for throughput. It doesn’t work for the human.

Your cognitive bandwidth doesn’t parallelize. The agent does the generating. You still do all the evaluating, deciding, trusting, and integrating. Those are all single-threaded on your side of the loop.

What scales is your throughput of supervision, not your throughput of understanding. You can supervise more agents than you can deeply understand. But supervision without understanding is exactly where comprehension debt lives. The code ships, but your mental model of what you’ve built falls further behind with each additional thread.

This is why the conductor metaphor from the Code Agent Orchestra framing holds up. A conductor doesn’t play every instrument - they hold the whole piece. But holding the whole piece is exhausting precisely because it’s whole-system awareness. You can’t extend that by trying harder.

Finding your ceiling is itself a skill

Most of us are going to find it the hard way: by blowing past it. A session that starts well at two agents, feels good enough to add a third, adds a fourth, and by noon you’re not reviewing anything carefully anymore - you’re just accepting output. It doesn’t feel like failure. It feels like productivity.

Your ceiling isn’t a fixed number. It shifts with the complexity of each thread. Two agents working on genuinely novel architectural problems will exhaust you faster than four agents doing well-bounded migrations. The scope per thread is a bigger variable than the count.

It shifts with how well you specced the work upfront. A vague brief creates mid-flight ambiguity that pulls you back into the thread repeatedly. A crisp brief with clear acceptance criteria makes the thread more self-contained. Writing a proper brief before spawning isn’t overhead - it’s what lets you actually step away.

It also shifts with how long the session runs. Short, time-boxed sessions feel fundamentally different from open-ended ones. Without a defined end state, you’re holding open loops indefinitely. And your ceiling on a high-sleep, low-meeting day is meaningfully different from your ceiling on a Thursday afternoon after six hours of back-to-backs. Using the same agent count across both is a miscalibration most people don’t notice until they’re already burned out.

What I’ve actually changed

I’ve started treating long agentic sessions the way I treat deep focus work: time-boxed, with explicit scopes per thread.

Concretely: I now define a session duration before I spawn anything, and I size the task scope to fit. If I want to run a two-hour session with three threads, I scope each thread to something resolvable in that window with output I can review in thirty minutes per thread. If the task doesn’t fit that shape, I either break it down or drop the thread count.

Time-boxing creates natural checkpoints that prevent quiet drift - you’re not checking in because you’re anxious, you’re checking in because the session ended. It also changes the ambient anxiety load. Open-ended threads carry indefinite vigilance. Time-boxed threads carry bounded vigilance. That difference compounds across a week.

The second change: I’ve accepted that three focused, well-reviewed threads produce more usable output than six threads I’m half-supervising. Throughput looks better with six. Code I’m willing to actually merge without a second pass looks better with three. My ceiling for a typical session is somewhere around three to four threads depending on complexity. Knowing that number means I stop treating fewer threads as a failure.

How to calibrate, if you want to be intentional about it

Start with one thread less than feels right. The temptation is to push agent count up until something breaks. Inverting this keeps you calibrated rather than reactive.

Watch your review quality, not your agent count. When your confidence in what you’re accepting starts dropping before the session ends, you’ve found your ceiling for that session. That signal is more honest than any rule of thumb.

Notice the anxiety signal early. The feeling of not quite knowing what’s happening in a thread is information. When it appears across more than one thread simultaneously, you’re at capacity.

And try scope reduction before count reduction. Often the lever isn’t “run fewer agents” - it’s “give each agent a tighter task.” Tighter scope per thread reduces the mental overhead each thread carries, which effectively raises how many you can run while staying genuinely on top of the work.

The part we’re not talking about enough

The conversation about AI coding agents is almost entirely about what agents can do. It needs to also be about what humans can sustain.

The ceiling isn’t a personal failure. We have bounded working memory, real context switching costs, and finite vigilance. The engineers who get the most from parallel agents in the long run won’t be the ones running the most simultaneously - they’ll be the ones who know the difference between throughput and understanding, and who’ve learned to treat that difference as a constraint to design around, not a weakness to push through.

Finding your personal ceiling with these tools is a skill. Most of us are going to learn it the hard way before we learn it intentionally. The goal is to shorten that gap.