Lesson 6: planning and reasoning - how agents tackle complex tasks

What you will learn

Why planning is essential for agents handling complex tasks
The agentic problem-solving loop: Get Mission, Scan Scene, Think, Act, Observe
Plan-then-execute vs. reactive approaches and the trade-offs of each
Hierarchical planning for breaking big tasks into manageable pieces
Re-planning: how agents adapt when things go wrong
Reasoning techniques: Chain-of-Thought, Tree-of-Thoughts, and self-consistency
The orchestration layer’s role in managing plans
Common failure modes and how to avoid them

Prerequisites

ELI5: Planning is like packing for a trip

Imagine you are packing for a two-week trip. You could just start grabbing stuff and shoving it in a suitcase. Maybe you will get lucky and have everything you need. More likely, you will forget your toothbrush and pack three jackets you never wear.

A better approach: think about where you are going, what the weather will be like, what activities you have planned, and then make a packing list. Check items off as you pack them. If you find out halfway through that you do not have enough room, you re-prioritize - drop the “just in case” items and keep the essentials.

That is what planning does for an AI agent. Instead of reacting to each step blindly, the agent thinks ahead, makes a plan, and works through it systematically. And when the plan hits a snag, a good agent adjusts rather than plowing forward with a broken approach.

Why planning matters

Simple tasks do not need planning. If someone asks “What time is it?”, the agent just checks the clock. No plan required.

But consider a request like: “Research the top 5 competitors in the cloud database market, compare their pricing and features, and write a recommendation for which one we should use for our new project.”

This task requires:

Identifying who the competitors are
Researching each one (pricing, features, limitations)
Understanding the user’s project requirements
Comparing options against those requirements
Synthesizing everything into a recommendation

Without planning, the agent might start researching one competitor in extreme detail, run out of context space, and forget to cover the other four. Or it might jump straight to a recommendation without doing thorough research.

Planning gives the agent a roadmap. It knows where it is going, what steps come next, and how to allocate its time and resources.

What planning buys you

Benefit	Without planning	With planning
Task completion	May miss steps or get stuck	Systematic coverage of all steps
Resource efficiency	Wastes tokens on tangents	Allocates effort where it matters
Transparency	Hard to see what the agent is doing	Clear progress tracking
Error recovery	Gets lost when things go wrong	Can identify where the plan broke and adjust
Parallelism	Everything is sequential	Independent steps can run concurrently

The agentic problem-solving loop

At the heart of every planning agent is a problem-solving loop. Different frameworks describe this differently, but the core steps are consistent:

Get mission -> scan scene -> think -> act -> observe

+-------------------------------------------------------------------+
|                                                                   |
|   +------------+     +------------+     +---------+               |
|   | Get Mission| --> | Scan Scene | --> |  Think  |               |
|   +------------+     +------------+     +----+----+               |
|                                              |                    |
|                                              v                    |
|                        +---------+     +-----+-----+              |
|                        | Observe | <-- |    Act     |             |
|                        +----+----+     +-----------+              |
|                             |                                     |
|                             v                                     |
|                      +------+------+                              |
|                      | Goal met?   |                              |
|                      | Yes -> Done |                              |
|                      | No -> Think |                              |
|                      +-------------+                              |
|                                                                   |
+-------------------------------------------------------------------+

Step by step

1. Get Mission

The agent receives its task. This could be a user request, a triggered event, or a delegated subtask from another agent.

Mission: "Create a summary report of our team's GitHub activity this week."

2. Scan Scene

The agent gathers context about its current situation. What tools are available? What information does it already have? What constraints exist?

Available tools: github_api, document_writer, calendar_api
Known context: Team = "platform-eng", Current date = March 15
Constraints: Report should be under 2 pages

3. Think

The agent reasons about what to do next. This is where planning happens - the agent considers its options and decides on a course of action.

Thought: "I need to:
  1. Get the list of team members
  2. For each member, fetch their commits, PRs, and reviews this week
  3. Aggregate the data
  4. Write a summary highlighting key contributions
  I will start by getting the team member list."

4. Act

The agent executes an action - calling a tool, generating text, or making a decision.

Action: github_api.get_team_members(team="platform-eng")

5. Observe

The agent examines the result of its action. Did it succeed? What information did it provide? Does the plan need to change?

Observation: Team has 8 members: [alice, bob, carol, dave, eve, frank, grace, hank]

The loop then returns to Think, where the agent decides the next step based on what it observed. This continues until the mission is complete.

Plan-then-execute vs. reactive approaches

There are two fundamental philosophies for how agents tackle tasks. Most real agents blend both, but understanding the pure forms helps you make design decisions.

Plan-then-execute

How it works: The agent creates a complete (or near-complete) plan before taking any action. Then it executes the plan step by step.

User: "Migrate our application from Python 2 to Python 3."

Planning phase:
  Step 1: Audit codebase for Python 2-specific syntax
  Step 2: Identify incompatible dependencies
  Step 3: Update dependencies to Python 3-compatible versions
  Step 4: Convert print statements to print functions
  Step 5: Fix string/unicode handling
  Step 6: Update integer division operators
  Step 7: Run test suite and fix failures
  Step 8: Update CI/CD pipeline to use Python 3

Execution phase:
  [Execute steps 1-8 in order]

Strengths:

Clear structure and progress tracking
Can identify dependencies between steps upfront
Enables parallel execution of independent steps
Easy to estimate total effort and communicate progress

Weaknesses:

Plan may be wrong or incomplete (you do not know what you do not know)
Rigid - hard to accommodate surprises
Upfront planning takes time and tokens
Can lead to over-planning for simple tasks

Reactive approach

How it works: The agent takes things one step at a time. It looks at the current situation, decides the best next action, takes it, and reassesses. This is essentially the ReAct pattern from Lesson 4.

User: "Migrate our application from Python 2 to Python 3."

Step 1:
  Think: "First, I should understand the codebase. Let me scan for Python 2 syntax."
  Act: Search for Python 2 patterns
  Observe: Found 47 print statements, 12 unicode issues, 3 deprecated imports

Step 2:
  Think: "The print statements are the most common issue. Let me start there."
  Act: Convert print statements
  Observe: 47 print statements converted

Step 3:
  Think: "Now let me tackle the unicode issues..."
  [continues step by step]

Strengths:

Highly adaptive - responds to what it actually finds
No wasted planning effort
Works well for exploratory tasks
Naturally handles surprises

Weaknesses:

No big-picture view of progress
May miss important steps that upfront planning would catch
Hard to parallelize - each step depends on the previous observation
Can wander off track without a guiding plan

The pragmatic middle ground

Most production agents use a hybrid approach:

Light upfront planning: Create a rough, high-level plan (3-7 steps)
Reactive execution: Use ReAct-style reasoning within each step
Periodic re-planning: After major steps, reassess and adjust the plan

High-level plan (light):
  1. Audit the codebase
  2. Fix syntax issues
  3. Update dependencies
  4. Run tests and fix failures

Executing step 1 (reactive):
  Think -> Act -> Observe -> Think -> Act -> Observe -> Step complete

Re-plan after step 1:
  "The audit revealed more issues than expected. Adding step 2b for
   database migration code that uses a deprecated library."

This gives you the structure of planning with the adaptability of reactive execution.

When to favor each approach

Situation	Favor planning	Favor reactive
Task is well-understood	Yes
Task is exploratory		Yes
Multiple independent subtasks	Yes
High uncertainty about what you will find		Yes
Need to communicate progress	Yes
Speed matters more than thoroughness		Yes
Failure is costly	Yes
Task is simple (fewer than 3 steps)		Yes

Hierarchical planning

Breaking big tasks into smaller ones

Some tasks are too complex to plan in a single flat list. Hierarchical planning breaks a big goal into sub-goals, which are further broken into sub-sub-goals - like a work breakdown structure in project management.

The project management analogy

Think about how a software project is organized:

Epic: Launch new payment system
  |
  +-- Story: Design payment API
  |     +-- Task: Define endpoints
  |     +-- Task: Design data models
  |     +-- Task: Write API specification
  |
  +-- Story: Implement payment processing
  |     +-- Task: Integrate with payment gateway
  |     +-- Task: Handle error cases
  |     +-- Task: Add retry logic
  |
  +-- Story: Add monitoring and alerts
        +-- Task: Set up logging
        +-- Task: Define SLOs
        +-- Task: Configure alert rules

An agent uses the same structure. For example, “Write a blog post about Kubernetes networking” breaks into:

Sub-goal 1: Research
  Task 1.1: Identify key networking concepts
  Task 1.2: Find recent developments and best practices

Sub-goal 2: Outline and Write
  Task 2.1: Create section structure
  Task 2.2: Write each section with code examples

Sub-goal 3: Review
  Task 3.1: Check technical accuracy
  Task 3.2: Verify code examples work

Benefits of hierarchical planning

Manageable chunks. Each task is small enough to execute without losing focus.
Clear dependencies. You can see which tasks depend on others and which can run in parallel.
Progress tracking. You know exactly where you are in the overall plan.
Delegation. In multi-agent systems, different sub-goals can be assigned to specialized agents.

Levels of abstraction

Level	What it describes	Example
Goal	What success looks like	”Deploy the application to production”
Sub-goal	Major phases of work	”Prepare the environment”
Task	Specific actions	”Create the Cloud Run service”
Step	Atomic operations	”Run `gcloud run deploy`”

Agents typically plan at the sub-goal and task level. Steps are handled by ReAct-style execution within each task.

Re-planning: adapting when things go wrong

No plan survives first contact with reality. Good agents detect when a plan is failing and adjust.

When to re-plan

Trigger	Example	Response
Task failure	API returns an error	Try alternative approach or skip and revisit
New information	Discovered the database uses a different schema than expected	Update the plan to account for the actual schema
Changed requirements	User adds a new requirement mid-task	Incorporate the new requirement into the plan
Resource constraints	Running low on context space or API quota	Simplify remaining steps
Blocked dependency	A required service is down	Reorder tasks to work on unblocked items first

Re-planning strategies

1. Local adjustment

Fix the immediate problem without changing the overall plan.

Original plan step: "Query the users table for active accounts"
Failure: "Table 'users' does not exist"
Adjustment: "Query the 'accounts' table instead (it has the same data)"

2. Step insertion

Add new steps to handle an unexpected situation.

Original plan:
  1. Fetch data
  2. Process data
  3. Generate report

After discovering data needs cleaning:
  1. Fetch data
  1b. Clean and validate data    <-- inserted
  2. Process data
  3. Generate report

3. Plan revision

Significantly restructure the remaining plan based on what you have learned.

Original plan: "Migrate the monolith to microservices"
After analysis: "The codebase is too tightly coupled for a direct migration."
Revised plan: "First, introduce module boundaries within the monolith
               (strangler fig pattern), then gradually extract services."

4. Goal modification

In extreme cases, the agent recognizes the original goal is not achievable and proposes a revised goal. For example, if the target was “reduce latency to under 50ms” but the database query alone takes 80ms, the agent might propose targeting 100ms with caching for frequently accessed endpoints.

The key principle: try a local fix first, and only restructure the remaining plan if the local fix is not possible.

Reasoning techniques

Planning requires reasoning - the ability to think through problems logically. Several techniques help agents reason more effectively.

Chain-of-Thought (CoT)

What it is: The agent works through a problem step by step, showing its reasoning at each stage. Instead of jumping to an answer, it “thinks out loud.”

Analogy: Showing your work on a math test. Instead of writing just the answer, you write each step of the calculation.

Without Chain-of-Thought:
  Q: "If a server handles 100 requests/second and we need to handle 10,000
      requests/second, how many servers do we need?"
  A: "100 servers"

With Chain-of-Thought:
  Q: [same question]
  A: "Let me work through this:
      - One server handles 100 requests/second
      - We need 10,000 requests/second total
      - 10,000 / 100 = 100 servers for the raw capacity
      - But we should account for redundancy. With N+1 redundancy, we need 101.
      - And servers do not run at 100% efficiency. At 80% target utilization:
        10,000 / (100 * 0.8) = 125 servers
      - With N+1 redundancy: 126 servers
      Answer: 126 servers for production, assuming 80% target utilization
              and N+1 redundancy."

The step-by-step reasoning catches nuances (redundancy, utilization) that a snap answer misses.

When to use CoT:

Math and logic problems
Multi-step reasoning
Tasks where intermediate steps matter
Debugging and root cause analysis

Tree-of-Thoughts (ToT)

What it is: Instead of following a single chain of reasoning, the agent explores multiple possible paths and evaluates which one is most promising. Think of it as brainstorming several approaches before committing to one.

Analogy: A chess player considering several possible moves, thinking a few steps ahead for each one, and then choosing the best path.

Problem: "Our API is timing out under load. How should we fix it?"

Branch 1: Add caching
  -> Evaluate: "Would reduce database load by ~60%. Implementation
     takes 2 days. Risk: cache invalidation complexity."
  Score: 7/10

Branch 2: Optimize database queries
  -> Evaluate: "Could improve query time by ~40%. Implementation
     takes 3 days. Risk: low, well-understood approach."
  Score: 6/10

Branch 3: Add horizontal scaling
  -> Evaluate: "Handles any load level. Implementation takes 1 day
     with Kubernetes. Risk: increases infrastructure cost."
  Score: 8/10

Decision: Start with horizontal scaling (fastest win), then add
          caching for long-term efficiency.

When to use ToT:

Problems with multiple valid approaches
Strategic decisions
Tasks where the first idea might not be the best
Architecture and design decisions

Self-consistency

What it is: The agent solves the same problem multiple times using different reasoning paths, then checks if the answers agree. If most paths lead to the same conclusion, confidence is high. If they disagree, more investigation is needed.

Analogy: Asking three different mechanics about a car problem. If they all say “bad alternator,” you can be confident. If they each say something different, you need more diagnostics.

Problem: "Is this code change safe to deploy on a Friday?"

Reasoning path 1 (risk analysis):
  "The change modifies the payment flow. Payment changes on Friday
   mean weekend incidents. Verdict: No."

Reasoning path 2 (scope analysis):
  "The change is 5 lines and has 95% test coverage. Small, well-tested
   changes are low risk. Verdict: Yes, with monitoring."

Reasoning path 3 (historical analysis):
  "Similar changes in the past have caused issues 15% of the time.
   That is above our 10% threshold. Verdict: No."

Consensus: 2 out of 3 say No. Recommendation: Wait until Monday.

When to use self-consistency:

High-stakes decisions
Tasks where errors are costly
Ambiguous problems with no clear single answer
Validation of critical reasoning

Comparing reasoning techniques

Technique	Approach	Strength	Cost	Best for
Chain-of-Thought	Step-by-step linear	Thoroughness	1x (single pass)	Most reasoning tasks
Tree-of-Thoughts	Explore multiple paths	Finds best approach	3-5x (multiple branches)	Strategic decisions
Self-consistency	Multiple independent attempts	Confidence calibration	3-5x (multiple passes)	High-stakes validation

The orchestration layer

The orchestration layer is the control system that manages how an agent plans, reasons, and executes. Think of it as the conductor of an orchestra - it does not play any instruments, but it decides who plays what and when.

What the orchestration layer does

+----------------------------------------------------------+
|                   ORCHESTRATION LAYER                    |
|                                                          |
|  +----------+  +-----------+  +----------+  +--------+  |
|  | Plan     |  | Execute   |  | Monitor  |  | Re-plan|  |
|  | Manager  |  | Engine    |  | & Eval   |  | Logic  |  |
|  +----------+  +-----------+  +----------+  +--------+  |
|                                                          |
|  Responsibilities:                                       |
|  - Break goals into executable steps                     |
|  - Decide execution order and parallelism                |
|  - Route steps to the right tools or sub-agents          |
|  - Track progress and state                              |
|  - Detect failures and trigger re-planning               |
|  - Manage context window usage                           |
|  - Enforce guardrails and safety checks                  |
+----------------------------------------------------------+

Orchestration in practice

In Google Cloud’s Vertex AI Agent Engine, the orchestration layer handles:

Routing: Directing user requests to the right agent or tool
State management: Tracking where the agent is in its plan
Tool execution: Calling tools and processing results
Error handling: Catching failures and deciding how to recover
Context management: Keeping the context window organized

The Agent Development Kit (ADK) gives you building blocks for customizing orchestration behavior. You define the agent’s tools, instructions, and behavior - the framework handles the execution loop.

Key orchestration decisions

Decision	Options	Trade-off
How much to plan upfront	Full plan vs. next step only	Thoroughness vs. flexibility
When to re-plan	After every step vs. only on failure	Adaptability vs. overhead
How to handle failure	Retry, skip, abort, or re-plan	Resilience vs. cost
Sequential vs. parallel	One step at a time vs. concurrent	Simplicity vs. speed
How much reasoning to show	Internal only vs. exposed to user	Transparency vs. noise

Common failure modes

Even well-designed planning agents can fail in predictable ways. Knowing these failure modes helps you build defenses against them.

1. Infinite loops

What happens: The agent gets stuck repeating the same action or re-planning endlessly without making progress.

Example:

Think: "I need to find the user's email. Let me search the database."
Act: search_database(query="user email")
Observe: No results found.
Think: "I need to find the user's email. Let me search the database."
Act: search_database(query="user email")
Observe: No results found.
[repeats forever]

Prevention:

Set a maximum iteration count for any loop
Track actions taken and detect repetition
After N failed attempts at the same action, force the agent to try a different approach
Include a “give up gracefully” option - ask the user for help

2. Plan drift

What happens: The agent gradually wanders away from the original goal, following interesting tangents instead of staying on track.

Example:

Original goal: "Write a summary of Q4 sales performance"
Step 1: Fetch Q4 sales data [on track]
Step 2: Notice an anomaly in November data [slightly off track]
Step 3: Deep dive into November anomaly [drifting]
Step 4: Research industry trends that might explain the anomaly [lost]
Step 5: Write a report about industry trends [completely off track]

Prevention:

Periodically check: “Is what I am doing aligned with the original goal?”
Include the original goal in every reasoning step
Set scope boundaries in the plan
Use a separate evaluation step to check relevance

3. Over-planning simple tasks

What happens: The agent spends more time planning than it would take to just do the task.

Example:

User: "What is 2 + 2?"

Agent Plan:
  Step 1: Identify the mathematical operation (addition)
  Step 2: Identify the operands (2 and 2)
  Step 3: Verify the operands are valid numbers
  Step 4: Perform the addition
  Step 5: Verify the result
  Step 6: Format the response

[Just say "4"!]

Prevention:

Estimate task complexity before deciding to plan
Simple tasks (single-step, clear answer) should skip planning entirely
Set a planning time budget proportional to task complexity

4. Cascading failures

What happens: One failed step causes subsequent steps to fail in a chain reaction, and the agent does not detect the root cause.

Example:

Step 1: Fetch user profile -> Returns error (auth expired)
Step 2: Process user preferences -> Fails (no profile data)
Step 3: Generate recommendations -> Fails (no preferences)
Step 4: Format output -> Fails (no recommendations)
Agent: "I was unable to complete the task due to formatting errors."
  [Wrong! The real problem was expired authentication in step 1]

Prevention:

Treat each step’s output as input validation for the next step
On failure, trace back to find the root cause
Fail fast on critical dependencies rather than trying to proceed without them
Report the actual root cause, not just the latest symptom

5. Context exhaustion

What happens: The plan and its execution history fill up the context window, leaving no room for the agent to reason about the current step.

Prevention:

Summarize completed steps instead of keeping full details
Store intermediate results externally and reference them by ID
Budget context space: reserve a fixed amount for reasoning about the current step
See Lesson 5: Memory and Context for detailed strategies

Failure mode summary

Failure mode	Symptom	Key prevention
Infinite loops	Same action repeated	Max iteration count + repetition detection
Plan drift	Agent wanders off topic	Periodic goal alignment check
Over-planning	Excessive planning for simple tasks	Complexity estimation before planning
Cascading failures	Wrong root cause reported	Trace back on failure + fail fast
Context exhaustion	Agent loses coherence	Context budgeting + summarization

Putting it together: a practical example

Here is a condensed example showing how planning, reasoning, and re-planning combine:

User: "Analyze our error logs from the past week and create Jira tickets
       for the top issues."

PLAN:
  Phase 1: Query error logs (past 7 days, ERROR + FATAL severity)
  Phase 2: Group by type, rank by frequency, identify top 5
  Phase 3: For each issue, check for existing tickets, create if new
  Phase 4: Summarize findings

EXECUTE (ReAct within each phase):
  Act: log_query(severity=["ERROR","FATAL"], days=7)
  Observe: 12,847 entries returned
  Act: log_analyze(group_by="error_message", sort="count_desc")
  Observe: Top issues identified (connection timeouts, null pointer, rate limits)

  Act: jira_search(query="Connection timeout payment-service")
  Observe: No existing ticket -> Create new ticket PLATFORM-1234

RE-PLAN (mid-execution):
  Observe: PLATFORM-999 already exists for NullPointerException
  Adjust: Update existing ticket with new occurrence count instead of creating duplicate

DELIVER:
  "Analyzed 12,847 errors. Created 4 new tickets, updated 1 existing one."

Notice how the agent uses hierarchical planning upfront, ReAct-style execution within each phase, and re-planning when it discovers an existing ticket. All the patterns working together.

Key takeaways

Planning transforms complex tasks into manageable steps. Without it, agents either miss important steps or waste effort on tangents.
The problem-solving loop (Mission, Scene, Think, Act, Observe) is the engine of every planning agent. Understanding this loop helps you design and debug agent behavior.
Plan-then-execute and reactive approaches are two ends of a spectrum. Most practical agents use a hybrid: light upfront planning with reactive execution.
Hierarchical planning handles complex tasks by breaking them into goals, sub-goals, tasks, and steps. This mirrors how project management works.
Re-planning is essential. Plans will break. Good agents detect failures early and adjust. The ability to adapt separates useful agents from fragile ones.
Reasoning techniques (CoT, ToT, self-consistency) make planning more effective. Chain-of-Thought for step-by-step logic, Tree-of-Thoughts for exploring alternatives, self-consistency for validating critical decisions.
Watch out for common failure modes. Infinite loops, plan drift, over-planning, cascading failures, and context exhaustion are predictable problems with known solutions.
The orchestration layer ties it all together. It manages the plan lifecycle, routes execution, handles errors, and keeps the agent on track.

Lesson 6: planning and reasoning - how agents tackle complex tasks

What you will learn

Prerequisites

ELI5: Planning is like packing for a trip

Why planning matters

What planning buys you

The agentic problem-solving loop

Get mission -> scan scene -> think -> act -> observe

Step by step

Plan-then-execute vs. reactive approaches

Plan-then-execute

Reactive approach

The pragmatic middle ground

When to favor each approach

Hierarchical planning

Breaking big tasks into smaller ones

The project management analogy

Benefits of hierarchical planning

Levels of abstraction

Re-planning: adapting when things go wrong

When to re-plan

Re-planning strategies

Reasoning techniques

Chain-of-Thought (CoT)

Tree-of-Thoughts (ToT)

Self-consistency

Comparing reasoning techniques

The orchestration layer

What the orchestration layer does

Orchestration in practice

Key orchestration decisions

Common failure modes

1. Infinite loops

2. Plan drift

3. Over-planning simple tasks

4. Cascading failures

5. Context exhaustion

Failure mode summary

Putting it together: a practical example

Key takeaways

Further reading