Lesson 6: planning and reasoning - how agents tackle complex tasks

What you will learn

Prerequisites


ELI5: Planning is like packing for a trip

Imagine you are packing for a two-week trip. You could just start grabbing stuff and shoving it in a suitcase. Maybe you will get lucky and have everything you need. More likely, you will forget your toothbrush and pack three jackets you never wear.

A better approach: think about where you are going, what the weather will be like, what activities you have planned, and then make a packing list. Check items off as you pack them. If you find out halfway through that you do not have enough room, you re-prioritize - drop the “just in case” items and keep the essentials.

That is what planning does for an AI agent. Instead of reacting to each step blindly, the agent thinks ahead, makes a plan, and works through it systematically. And when the plan hits a snag, a good agent adjusts rather than plowing forward with a broken approach.


Why planning matters

Simple tasks do not need planning. If someone asks “What time is it?”, the agent just checks the clock. No plan required.

But consider a request like: “Research the top 5 competitors in the cloud database market, compare their pricing and features, and write a recommendation for which one we should use for our new project.”

This task requires:

Without planning, the agent might start researching one competitor in extreme detail, run out of context space, and forget to cover the other four. Or it might jump straight to a recommendation without doing thorough research.

Planning gives the agent a roadmap. It knows where it is going, what steps come next, and how to allocate its time and resources.

What planning buys you

Benefit Without planning With planning
Task completion May miss steps or get stuck Systematic coverage of all steps
Resource efficiency Wastes tokens on tangents Allocates effort where it matters
Transparency Hard to see what the agent is doing Clear progress tracking
Error recovery Gets lost when things go wrong Can identify where the plan broke and adjust
Parallelism Everything is sequential Independent steps can run concurrently

The agentic problem-solving loop

At the heart of every planning agent is a problem-solving loop. Different frameworks describe this differently, but the core steps are consistent:

Get mission -> scan scene -> think -> act -> observe

+-------------------------------------------------------------------+
|                                                                   |
|   +------------+     +------------+     +---------+               |
|   | Get Mission| --> | Scan Scene | --> |  Think  |               |
|   +------------+     +------------+     +----+----+               |
|                                              |                    |
|                                              v                    |
|                        +---------+     +-----+-----+              |
|                        | Observe | <-- |    Act     |             |
|                        +----+----+     +-----------+              |
|                             |                                     |
|                             v                                     |
|                      +------+------+                              |
|                      | Goal met?   |                              |
|                      | Yes -> Done |                              |
|                      | No -> Think |                              |
|                      +-------------+                              |
|                                                                   |
+-------------------------------------------------------------------+

Step by step

1. Get Mission

The agent receives its task. This could be a user request, a triggered event, or a delegated subtask from another agent.

Mission: "Create a summary report of our team's GitHub activity this week."

2. Scan Scene

The agent gathers context about its current situation. What tools are available? What information does it already have? What constraints exist?

Available tools: github_api, document_writer, calendar_api
Known context: Team = "platform-eng", Current date = March 15
Constraints: Report should be under 2 pages

3. Think

The agent reasons about what to do next. This is where planning happens - the agent considers its options and decides on a course of action.

Thought: "I need to:
  1. Get the list of team members
  2. For each member, fetch their commits, PRs, and reviews this week
  3. Aggregate the data
  4. Write a summary highlighting key contributions
  I will start by getting the team member list."

4. Act

The agent executes an action - calling a tool, generating text, or making a decision.

Action: github_api.get_team_members(team="platform-eng")

5. Observe

The agent examines the result of its action. Did it succeed? What information did it provide? Does the plan need to change?

Observation: Team has 8 members: [alice, bob, carol, dave, eve, frank, grace, hank]

The loop then returns to Think, where the agent decides the next step based on what it observed. This continues until the mission is complete.


Plan-then-execute vs. reactive approaches

There are two fundamental philosophies for how agents tackle tasks. Most real agents blend both, but understanding the pure forms helps you make design decisions.

Plan-then-execute

How it works: The agent creates a complete (or near-complete) plan before taking any action. Then it executes the plan step by step.

User: "Migrate our application from Python 2 to Python 3."

Planning phase:
  Step 1: Audit codebase for Python 2-specific syntax
  Step 2: Identify incompatible dependencies
  Step 3: Update dependencies to Python 3-compatible versions
  Step 4: Convert print statements to print functions
  Step 5: Fix string/unicode handling
  Step 6: Update integer division operators
  Step 7: Run test suite and fix failures
  Step 8: Update CI/CD pipeline to use Python 3

Execution phase:
  [Execute steps 1-8 in order]

Strengths:

Weaknesses:

Reactive approach

How it works: The agent takes things one step at a time. It looks at the current situation, decides the best next action, takes it, and reassesses. This is essentially the ReAct pattern from Lesson 4.

User: "Migrate our application from Python 2 to Python 3."

Step 1:
  Think: "First, I should understand the codebase. Let me scan for Python 2 syntax."
  Act: Search for Python 2 patterns
  Observe: Found 47 print statements, 12 unicode issues, 3 deprecated imports

Step 2:
  Think: "The print statements are the most common issue. Let me start there."
  Act: Convert print statements
  Observe: 47 print statements converted

Step 3:
  Think: "Now let me tackle the unicode issues..."
  [continues step by step]

Strengths:

Weaknesses:

The pragmatic middle ground

Most production agents use a hybrid approach:

  1. Light upfront planning: Create a rough, high-level plan (3-7 steps)
  2. Reactive execution: Use ReAct-style reasoning within each step
  3. Periodic re-planning: After major steps, reassess and adjust the plan
High-level plan (light):
  1. Audit the codebase
  2. Fix syntax issues
  3. Update dependencies
  4. Run tests and fix failures

Executing step 1 (reactive):
  Think -> Act -> Observe -> Think -> Act -> Observe -> Step complete

Re-plan after step 1:
  "The audit revealed more issues than expected. Adding step 2b for
   database migration code that uses a deprecated library."

This gives you the structure of planning with the adaptability of reactive execution.

When to favor each approach

Situation Favor planning Favor reactive
Task is well-understood Yes  
Task is exploratory   Yes
Multiple independent subtasks Yes  
High uncertainty about what you will find   Yes
Need to communicate progress Yes  
Speed matters more than thoroughness   Yes
Failure is costly Yes  
Task is simple (fewer than 3 steps)   Yes

Hierarchical planning

Breaking big tasks into smaller ones

Some tasks are too complex to plan in a single flat list. Hierarchical planning breaks a big goal into sub-goals, which are further broken into sub-sub-goals - like a work breakdown structure in project management.

The project management analogy

Think about how a software project is organized:

Epic: Launch new payment system
  |
  +-- Story: Design payment API
  |     +-- Task: Define endpoints
  |     +-- Task: Design data models
  |     +-- Task: Write API specification
  |
  +-- Story: Implement payment processing
  |     +-- Task: Integrate with payment gateway
  |     +-- Task: Handle error cases
  |     +-- Task: Add retry logic
  |
  +-- Story: Add monitoring and alerts
        +-- Task: Set up logging
        +-- Task: Define SLOs
        +-- Task: Configure alert rules

An agent uses the same structure. For example, “Write a blog post about Kubernetes networking” breaks into:

Sub-goal 1: Research
  Task 1.1: Identify key networking concepts
  Task 1.2: Find recent developments and best practices

Sub-goal 2: Outline and Write
  Task 2.1: Create section structure
  Task 2.2: Write each section with code examples

Sub-goal 3: Review
  Task 3.1: Check technical accuracy
  Task 3.2: Verify code examples work

Benefits of hierarchical planning

Levels of abstraction

Level What it describes Example
Goal What success looks like “Deploy the application to production”
Sub-goal Major phases of work “Prepare the environment”
Task Specific actions “Create the Cloud Run service”
Step Atomic operations “Run gcloud run deploy

Agents typically plan at the sub-goal and task level. Steps are handled by ReAct-style execution within each task.


Re-planning: adapting when things go wrong

No plan survives first contact with reality. Good agents detect when a plan is failing and adjust.

When to re-plan

Trigger Example Response
Task failure API returns an error Try alternative approach or skip and revisit
New information Discovered the database uses a different schema than expected Update the plan to account for the actual schema
Changed requirements User adds a new requirement mid-task Incorporate the new requirement into the plan
Resource constraints Running low on context space or API quota Simplify remaining steps
Blocked dependency A required service is down Reorder tasks to work on unblocked items first

Re-planning strategies

1. Local adjustment

Fix the immediate problem without changing the overall plan.

Original plan step: "Query the users table for active accounts"
Failure: "Table 'users' does not exist"
Adjustment: "Query the 'accounts' table instead (it has the same data)"

2. Step insertion

Add new steps to handle an unexpected situation.

Original plan:
  1. Fetch data
  2. Process data
  3. Generate report

After discovering data needs cleaning:
  1. Fetch data
  1b. Clean and validate data    <-- inserted
  2. Process data
  3. Generate report

3. Plan revision

Significantly restructure the remaining plan based on what you have learned.

Original plan: "Migrate the monolith to microservices"
After analysis: "The codebase is too tightly coupled for a direct migration."
Revised plan: "First, introduce module boundaries within the monolith
               (strangler fig pattern), then gradually extract services."

4. Goal modification

In extreme cases, the agent recognizes the original goal is not achievable and proposes a revised goal. For example, if the target was “reduce latency to under 50ms” but the database query alone takes 80ms, the agent might propose targeting 100ms with caching for frequently accessed endpoints.

The key principle: try a local fix first, and only restructure the remaining plan if the local fix is not possible.


Reasoning techniques

Planning requires reasoning - the ability to think through problems logically. Several techniques help agents reason more effectively.

Chain-of-Thought (CoT)

What it is: The agent works through a problem step by step, showing its reasoning at each stage. Instead of jumping to an answer, it “thinks out loud.”

Analogy: Showing your work on a math test. Instead of writing just the answer, you write each step of the calculation.

Without Chain-of-Thought:
  Q: "If a server handles 100 requests/second and we need to handle 10,000
      requests/second, how many servers do we need?"
  A: "100 servers"

With Chain-of-Thought:
  Q: [same question]
  A: "Let me work through this:
      - One server handles 100 requests/second
      - We need 10,000 requests/second total
      - 10,000 / 100 = 100 servers for the raw capacity
      - But we should account for redundancy. With N+1 redundancy, we need 101.
      - And servers do not run at 100% efficiency. At 80% target utilization:
        10,000 / (100 * 0.8) = 125 servers
      - With N+1 redundancy: 126 servers
      Answer: 126 servers for production, assuming 80% target utilization
              and N+1 redundancy."

The step-by-step reasoning catches nuances (redundancy, utilization) that a snap answer misses.

When to use CoT:

Tree-of-Thoughts (ToT)

What it is: Instead of following a single chain of reasoning, the agent explores multiple possible paths and evaluates which one is most promising. Think of it as brainstorming several approaches before committing to one.

Analogy: A chess player considering several possible moves, thinking a few steps ahead for each one, and then choosing the best path.

Problem: "Our API is timing out under load. How should we fix it?"

Branch 1: Add caching
  -> Evaluate: "Would reduce database load by ~60%. Implementation
     takes 2 days. Risk: cache invalidation complexity."
  Score: 7/10

Branch 2: Optimize database queries
  -> Evaluate: "Could improve query time by ~40%. Implementation
     takes 3 days. Risk: low, well-understood approach."
  Score: 6/10

Branch 3: Add horizontal scaling
  -> Evaluate: "Handles any load level. Implementation takes 1 day
     with Kubernetes. Risk: increases infrastructure cost."
  Score: 8/10

Decision: Start with horizontal scaling (fastest win), then add
          caching for long-term efficiency.

When to use ToT:

Self-consistency

What it is: The agent solves the same problem multiple times using different reasoning paths, then checks if the answers agree. If most paths lead to the same conclusion, confidence is high. If they disagree, more investigation is needed.

Analogy: Asking three different mechanics about a car problem. If they all say “bad alternator,” you can be confident. If they each say something different, you need more diagnostics.

Problem: "Is this code change safe to deploy on a Friday?"

Reasoning path 1 (risk analysis):
  "The change modifies the payment flow. Payment changes on Friday
   mean weekend incidents. Verdict: No."

Reasoning path 2 (scope analysis):
  "The change is 5 lines and has 95% test coverage. Small, well-tested
   changes are low risk. Verdict: Yes, with monitoring."

Reasoning path 3 (historical analysis):
  "Similar changes in the past have caused issues 15% of the time.
   That is above our 10% threshold. Verdict: No."

Consensus: 2 out of 3 say No. Recommendation: Wait until Monday.

When to use self-consistency:

Comparing reasoning techniques

Technique Approach Strength Cost Best for
Chain-of-Thought Step-by-step linear Thoroughness 1x (single pass) Most reasoning tasks
Tree-of-Thoughts Explore multiple paths Finds best approach 3-5x (multiple branches) Strategic decisions
Self-consistency Multiple independent attempts Confidence calibration 3-5x (multiple passes) High-stakes validation

The orchestration layer

The orchestration layer is the control system that manages how an agent plans, reasons, and executes. Think of it as the conductor of an orchestra - it does not play any instruments, but it decides who plays what and when.

What the orchestration layer does

+----------------------------------------------------------+
|                   ORCHESTRATION LAYER                    |
|                                                          |
|  +----------+  +-----------+  +----------+  +--------+  |
|  | Plan     |  | Execute   |  | Monitor  |  | Re-plan|  |
|  | Manager  |  | Engine    |  | & Eval   |  | Logic  |  |
|  +----------+  +-----------+  +----------+  +--------+  |
|                                                          |
|  Responsibilities:                                       |
|  - Break goals into executable steps                     |
|  - Decide execution order and parallelism                |
|  - Route steps to the right tools or sub-agents          |
|  - Track progress and state                              |
|  - Detect failures and trigger re-planning               |
|  - Manage context window usage                           |
|  - Enforce guardrails and safety checks                  |
+----------------------------------------------------------+

Orchestration in practice

In Google Cloud’s Vertex AI Agent Engine, the orchestration layer handles:

The Agent Development Kit (ADK) gives you building blocks for customizing orchestration behavior. You define the agent’s tools, instructions, and behavior - the framework handles the execution loop.

Key orchestration decisions

Decision Options Trade-off
How much to plan upfront Full plan vs. next step only Thoroughness vs. flexibility
When to re-plan After every step vs. only on failure Adaptability vs. overhead
How to handle failure Retry, skip, abort, or re-plan Resilience vs. cost
Sequential vs. parallel One step at a time vs. concurrent Simplicity vs. speed
How much reasoning to show Internal only vs. exposed to user Transparency vs. noise

Common failure modes

Even well-designed planning agents can fail in predictable ways. Knowing these failure modes helps you build defenses against them.

1. Infinite loops

What happens: The agent gets stuck repeating the same action or re-planning endlessly without making progress.

Example:

Think: "I need to find the user's email. Let me search the database."
Act: search_database(query="user email")
Observe: No results found.
Think: "I need to find the user's email. Let me search the database."
Act: search_database(query="user email")
Observe: No results found.
[repeats forever]

Prevention:

2. Plan drift

What happens: The agent gradually wanders away from the original goal, following interesting tangents instead of staying on track.

Example:

Original goal: "Write a summary of Q4 sales performance"
Step 1: Fetch Q4 sales data [on track]
Step 2: Notice an anomaly in November data [slightly off track]
Step 3: Deep dive into November anomaly [drifting]
Step 4: Research industry trends that might explain the anomaly [lost]
Step 5: Write a report about industry trends [completely off track]

Prevention:

3. Over-planning simple tasks

What happens: The agent spends more time planning than it would take to just do the task.

Example:

User: "What is 2 + 2?"

Agent Plan:
  Step 1: Identify the mathematical operation (addition)
  Step 2: Identify the operands (2 and 2)
  Step 3: Verify the operands are valid numbers
  Step 4: Perform the addition
  Step 5: Verify the result
  Step 6: Format the response

[Just say "4"!]

Prevention:

4. Cascading failures

What happens: One failed step causes subsequent steps to fail in a chain reaction, and the agent does not detect the root cause.

Example:

Step 1: Fetch user profile -> Returns error (auth expired)
Step 2: Process user preferences -> Fails (no profile data)
Step 3: Generate recommendations -> Fails (no preferences)
Step 4: Format output -> Fails (no recommendations)
Agent: "I was unable to complete the task due to formatting errors."
  [Wrong! The real problem was expired authentication in step 1]

Prevention:

5. Context exhaustion

What happens: The plan and its execution history fill up the context window, leaving no room for the agent to reason about the current step.

Prevention:

Failure mode summary

Failure mode Symptom Key prevention
Infinite loops Same action repeated Max iteration count + repetition detection
Plan drift Agent wanders off topic Periodic goal alignment check
Over-planning Excessive planning for simple tasks Complexity estimation before planning
Cascading failures Wrong root cause reported Trace back on failure + fail fast
Context exhaustion Agent loses coherence Context budgeting + summarization

Putting it together: a practical example

Here is a condensed example showing how planning, reasoning, and re-planning combine:

User: "Analyze our error logs from the past week and create Jira tickets
       for the top issues."

PLAN:
  Phase 1: Query error logs (past 7 days, ERROR + FATAL severity)
  Phase 2: Group by type, rank by frequency, identify top 5
  Phase 3: For each issue, check for existing tickets, create if new
  Phase 4: Summarize findings

EXECUTE (ReAct within each phase):
  Act: log_query(severity=["ERROR","FATAL"], days=7)
  Observe: 12,847 entries returned
  Act: log_analyze(group_by="error_message", sort="count_desc")
  Observe: Top issues identified (connection timeouts, null pointer, rate limits)

  Act: jira_search(query="Connection timeout payment-service")
  Observe: No existing ticket -> Create new ticket PLATFORM-1234

RE-PLAN (mid-execution):
  Observe: PLATFORM-999 already exists for NullPointerException
  Adjust: Update existing ticket with new occurrence count instead of creating duplicate

DELIVER:
  "Analyzed 12,847 errors. Created 4 new tickets, updated 1 existing one."

Notice how the agent uses hierarchical planning upfront, ReAct-style execution within each phase, and re-planning when it discovers an existing ticket. All the patterns working together.


Key takeaways

  1. Planning transforms complex tasks into manageable steps. Without it, agents either miss important steps or waste effort on tangents.

  2. The problem-solving loop (Mission, Scene, Think, Act, Observe) is the engine of every planning agent. Understanding this loop helps you design and debug agent behavior.

  3. Plan-then-execute and reactive approaches are two ends of a spectrum. Most practical agents use a hybrid: light upfront planning with reactive execution.

  4. Hierarchical planning handles complex tasks by breaking them into goals, sub-goals, tasks, and steps. This mirrors how project management works.

  5. Re-planning is essential. Plans will break. Good agents detect failures early and adjust. The ability to adapt separates useful agents from fragile ones.

  6. Reasoning techniques (CoT, ToT, self-consistency) make planning more effective. Chain-of-Thought for step-by-step logic, Tree-of-Thoughts for exploring alternatives, self-consistency for validating critical decisions.

  7. Watch out for common failure modes. Infinite loops, plan drift, over-planning, cascading failures, and context exhaustion are predictable problems with known solutions.

  8. The orchestration layer ties it all together. It manages the plan lifecycle, routes execution, handles errors, and keeps the agent on track.


Further reading


Next lesson: Multi-Agent Systems