Skip to content

Lesson 6: planning and reasoning - how agents tackle complex tasks

  • Why planning is essential for agents handling complex tasks
  • The agentic problem-solving loop: Get Mission, Scan Scene, Think, Act, Observe
  • Plan-then-execute vs. reactive approaches and the trade-offs of each
  • Hierarchical planning for breaking big tasks into manageable pieces
  • Re-planning: how agents adapt when things go wrong
  • Reasoning techniques: Chain-of-Thought, Tree-of-Thoughts, and self-consistency
  • The orchestration layer’s role in managing plans
  • Common failure modes and how to avoid them

Imagine you are packing for a two-week trip. You could just start grabbing stuff and shoving it in a suitcase. Maybe you will get lucky and have everything you need. More likely, you will forget your toothbrush and pack three jackets you never wear.

A better approach: think about where you are going, what the weather will be like, what activities you have planned, and then make a packing list. Check items off as you pack them. If you find out halfway through that you do not have enough room, you re-prioritize - drop the “just in case” items and keep the essentials.

That is what planning does for an AI agent. Instead of reacting to each step blindly, the agent thinks ahead, makes a plan, and works through it systematically. And when the plan hits a snag, a good agent adjusts rather than plowing forward with a broken approach.


Simple tasks do not need planning. If someone asks “What time is it?”, the agent just checks the clock. No plan required.

But consider a request like: “Research the top 5 competitors in the cloud database market, compare their pricing and features, and write a recommendation for which one we should use for our new project.”

This task requires:

  • Identifying who the competitors are
  • Researching each one (pricing, features, limitations)
  • Understanding the user’s project requirements
  • Comparing options against those requirements
  • Synthesizing everything into a recommendation

Without planning, the agent might start researching one competitor in extreme detail, run out of context space, and forget to cover the other four. Or it might jump straight to a recommendation without doing thorough research.

Planning gives the agent a roadmap. It knows where it is going, what steps come next, and how to allocate its time and resources.

BenefitWithout planningWith planning
Task completionMay miss steps or get stuckSystematic coverage of all steps
Resource efficiencyWastes tokens on tangentsAllocates effort where it matters
TransparencyHard to see what the agent is doingClear progress tracking
Error recoveryGets lost when things go wrongCan identify where the plan broke and adjust
ParallelismEverything is sequentialIndependent steps can run concurrently

At the heart of every planning agent is a problem-solving loop. Different frameworks describe this differently, but the core steps are consistent:

Get mission -> scan scene -> think -> act -> observe

Section titled “Get mission -> scan scene -> think -> act -> observe”
+-------------------------------------------------------------------+
| |
| +------------+ +------------+ +---------+ |
| | Get Mission| --> | Scan Scene | --> | Think | |
| +------------+ +------------+ +----+----+ |
| | |
| v |
| +---------+ +-----+-----+ |
| | Observe | <-- | Act | |
| +----+----+ +-----------+ |
| | |
| v |
| +------+------+ |
| | Goal met? | |
| | Yes -> Done | |
| | No -> Think | |
| +-------------+ |
| |
+-------------------------------------------------------------------+

1. Get Mission

The agent receives its task. This could be a user request, a triggered event, or a delegated subtask from another agent.

Mission: "Create a summary report of our team's GitHub activity this week."

2. Scan Scene

The agent gathers context about its current situation. What tools are available? What information does it already have? What constraints exist?

Available tools: github_api, document_writer, calendar_api
Known context: Team = "platform-eng", Current date = March 15
Constraints: Report should be under 2 pages

3. Think

The agent reasons about what to do next. This is where planning happens - the agent considers its options and decides on a course of action.

Thought: "I need to:
1. Get the list of team members
2. For each member, fetch their commits, PRs, and reviews this week
3. Aggregate the data
4. Write a summary highlighting key contributions
I will start by getting the team member list."

4. Act

The agent executes an action - calling a tool, generating text, or making a decision.

Action: github_api.get_team_members(team="platform-eng")

5. Observe

The agent examines the result of its action. Did it succeed? What information did it provide? Does the plan need to change?

Observation: Team has 8 members: [alice, bob, carol, dave, eve, frank, grace, hank]

The loop then returns to Think, where the agent decides the next step based on what it observed. This continues until the mission is complete.


There are two fundamental philosophies for how agents tackle tasks. Most real agents blend both, but understanding the pure forms helps you make design decisions.

How it works: The agent creates a complete (or near-complete) plan before taking any action. Then it executes the plan step by step.

User: "Migrate our application from Python 2 to Python 3."
Planning phase:
Step 1: Audit codebase for Python 2-specific syntax
Step 2: Identify incompatible dependencies
Step 3: Update dependencies to Python 3-compatible versions
Step 4: Convert print statements to print functions
Step 5: Fix string/unicode handling
Step 6: Update integer division operators
Step 7: Run test suite and fix failures
Step 8: Update CI/CD pipeline to use Python 3
Execution phase:
[Execute steps 1-8 in order]

Strengths:

  • Clear structure and progress tracking
  • Can identify dependencies between steps upfront
  • Enables parallel execution of independent steps
  • Easy to estimate total effort and communicate progress

Weaknesses:

  • Plan may be wrong or incomplete (you do not know what you do not know)
  • Rigid - hard to accommodate surprises
  • Upfront planning takes time and tokens
  • Can lead to over-planning for simple tasks

How it works: The agent takes things one step at a time. It looks at the current situation, decides the best next action, takes it, and reassesses. This is essentially the ReAct pattern from Lesson 4.

User: "Migrate our application from Python 2 to Python 3."
Step 1:
Think: "First, I should understand the codebase. Let me scan for Python 2 syntax."
Act: Search for Python 2 patterns
Observe: Found 47 print statements, 12 unicode issues, 3 deprecated imports
Step 2:
Think: "The print statements are the most common issue. Let me start there."
Act: Convert print statements
Observe: 47 print statements converted
Step 3:
Think: "Now let me tackle the unicode issues..."
[continues step by step]

Strengths:

  • Highly adaptive - responds to what it actually finds
  • No wasted planning effort
  • Works well for exploratory tasks
  • Naturally handles surprises

Weaknesses:

  • No big-picture view of progress
  • May miss important steps that upfront planning would catch
  • Hard to parallelize - each step depends on the previous observation
  • Can wander off track without a guiding plan

Most production agents use a hybrid approach:

  1. Light upfront planning: Create a rough, high-level plan (3-7 steps)
  2. Reactive execution: Use ReAct-style reasoning within each step
  3. Periodic re-planning: After major steps, reassess and adjust the plan
High-level plan (light):
1. Audit the codebase
2. Fix syntax issues
3. Update dependencies
4. Run tests and fix failures
Executing step 1 (reactive):
Think -> Act -> Observe -> Think -> Act -> Observe -> Step complete
Re-plan after step 1:
"The audit revealed more issues than expected. Adding step 2b for
database migration code that uses a deprecated library."

This gives you the structure of planning with the adaptability of reactive execution.

SituationFavor planningFavor reactive
Task is well-understoodYes
Task is exploratoryYes
Multiple independent subtasksYes
High uncertainty about what you will findYes
Need to communicate progressYes
Speed matters more than thoroughnessYes
Failure is costlyYes
Task is simple (fewer than 3 steps)Yes

Some tasks are too complex to plan in a single flat list. Hierarchical planning breaks a big goal into sub-goals, which are further broken into sub-sub-goals - like a work breakdown structure in project management.

Think about how a software project is organized:

Epic: Launch new payment system
|
+-- Story: Design payment API
| +-- Task: Define endpoints
| +-- Task: Design data models
| +-- Task: Write API specification
|
+-- Story: Implement payment processing
| +-- Task: Integrate with payment gateway
| +-- Task: Handle error cases
| +-- Task: Add retry logic
|
+-- Story: Add monitoring and alerts
+-- Task: Set up logging
+-- Task: Define SLOs
+-- Task: Configure alert rules

An agent uses the same structure. For example, “Write a blog post about Kubernetes networking” breaks into:

Sub-goal 1: Research
Task 1.1: Identify key networking concepts
Task 1.2: Find recent developments and best practices
Sub-goal 2: Outline and Write
Task 2.1: Create section structure
Task 2.2: Write each section with code examples
Sub-goal 3: Review
Task 3.1: Check technical accuracy
Task 3.2: Verify code examples work
  • Manageable chunks. Each task is small enough to execute without losing focus.
  • Clear dependencies. You can see which tasks depend on others and which can run in parallel.
  • Progress tracking. You know exactly where you are in the overall plan.
  • Delegation. In multi-agent systems, different sub-goals can be assigned to specialized agents.
LevelWhat it describesExample
GoalWhat success looks like”Deploy the application to production”
Sub-goalMajor phases of work”Prepare the environment”
TaskSpecific actions”Create the Cloud Run service”
StepAtomic operations”Run gcloud run deploy

Agents typically plan at the sub-goal and task level. Steps are handled by ReAct-style execution within each task.


Re-planning: adapting when things go wrong

Section titled “Re-planning: adapting when things go wrong”

No plan survives first contact with reality. Good agents detect when a plan is failing and adjust.

TriggerExampleResponse
Task failureAPI returns an errorTry alternative approach or skip and revisit
New informationDiscovered the database uses a different schema than expectedUpdate the plan to account for the actual schema
Changed requirementsUser adds a new requirement mid-taskIncorporate the new requirement into the plan
Resource constraintsRunning low on context space or API quotaSimplify remaining steps
Blocked dependencyA required service is downReorder tasks to work on unblocked items first

1. Local adjustment

Fix the immediate problem without changing the overall plan.

Original plan step: "Query the users table for active accounts"
Failure: "Table 'users' does not exist"
Adjustment: "Query the 'accounts' table instead (it has the same data)"

2. Step insertion

Add new steps to handle an unexpected situation.

Original plan:
1. Fetch data
2. Process data
3. Generate report
After discovering data needs cleaning:
1. Fetch data
1b. Clean and validate data <-- inserted
2. Process data
3. Generate report

3. Plan revision

Significantly restructure the remaining plan based on what you have learned.

Original plan: "Migrate the monolith to microservices"
After analysis: "The codebase is too tightly coupled for a direct migration."
Revised plan: "First, introduce module boundaries within the monolith
(strangler fig pattern), then gradually extract services."

4. Goal modification

In extreme cases, the agent recognizes the original goal is not achievable and proposes a revised goal. For example, if the target was “reduce latency to under 50ms” but the database query alone takes 80ms, the agent might propose targeting 100ms with caching for frequently accessed endpoints.

The key principle: try a local fix first, and only restructure the remaining plan if the local fix is not possible.


Planning requires reasoning - the ability to think through problems logically. Several techniques help agents reason more effectively.

What it is: The agent works through a problem step by step, showing its reasoning at each stage. Instead of jumping to an answer, it “thinks out loud.”

Analogy: Showing your work on a math test. Instead of writing just the answer, you write each step of the calculation.

Without Chain-of-Thought:
Q: "If a server handles 100 requests/second and we need to handle 10,000
requests/second, how many servers do we need?"
A: "100 servers"
With Chain-of-Thought:
Q: [same question]
A: "Let me work through this:
- One server handles 100 requests/second
- We need 10,000 requests/second total
- 10,000 / 100 = 100 servers for the raw capacity
- But we should account for redundancy. With N+1 redundancy, we need 101.
- And servers do not run at 100% efficiency. At 80% target utilization:
10,000 / (100 * 0.8) = 125 servers
- With N+1 redundancy: 126 servers
Answer: 126 servers for production, assuming 80% target utilization
and N+1 redundancy."

The step-by-step reasoning catches nuances (redundancy, utilization) that a snap answer misses.

When to use CoT:

  • Math and logic problems
  • Multi-step reasoning
  • Tasks where intermediate steps matter
  • Debugging and root cause analysis

What it is: Instead of following a single chain of reasoning, the agent explores multiple possible paths and evaluates which one is most promising. Think of it as brainstorming several approaches before committing to one.

Analogy: A chess player considering several possible moves, thinking a few steps ahead for each one, and then choosing the best path.

Problem: "Our API is timing out under load. How should we fix it?"
Branch 1: Add caching
-> Evaluate: "Would reduce database load by ~60%. Implementation
takes 2 days. Risk: cache invalidation complexity."
Score: 7/10
Branch 2: Optimize database queries
-> Evaluate: "Could improve query time by ~40%. Implementation
takes 3 days. Risk: low, well-understood approach."
Score: 6/10
Branch 3: Add horizontal scaling
-> Evaluate: "Handles any load level. Implementation takes 1 day
with Kubernetes. Risk: increases infrastructure cost."
Score: 8/10
Decision: Start with horizontal scaling (fastest win), then add
caching for long-term efficiency.

When to use ToT:

  • Problems with multiple valid approaches
  • Strategic decisions
  • Tasks where the first idea might not be the best
  • Architecture and design decisions

What it is: The agent solves the same problem multiple times using different reasoning paths, then checks if the answers agree. If most paths lead to the same conclusion, confidence is high. If they disagree, more investigation is needed.

Analogy: Asking three different mechanics about a car problem. If they all say “bad alternator,” you can be confident. If they each say something different, you need more diagnostics.

Problem: "Is this code change safe to deploy on a Friday?"
Reasoning path 1 (risk analysis):
"The change modifies the payment flow. Payment changes on Friday
mean weekend incidents. Verdict: No."
Reasoning path 2 (scope analysis):
"The change is 5 lines and has 95% test coverage. Small, well-tested
changes are low risk. Verdict: Yes, with monitoring."
Reasoning path 3 (historical analysis):
"Similar changes in the past have caused issues 15% of the time.
That is above our 10% threshold. Verdict: No."
Consensus: 2 out of 3 say No. Recommendation: Wait until Monday.

When to use self-consistency:

  • High-stakes decisions
  • Tasks where errors are costly
  • Ambiguous problems with no clear single answer
  • Validation of critical reasoning
TechniqueApproachStrengthCostBest for
Chain-of-ThoughtStep-by-step linearThoroughness1x (single pass)Most reasoning tasks
Tree-of-ThoughtsExplore multiple pathsFinds best approach3-5x (multiple branches)Strategic decisions
Self-consistencyMultiple independent attemptsConfidence calibration3-5x (multiple passes)High-stakes validation

The orchestration layer is the control system that manages how an agent plans, reasons, and executes. Think of it as the conductor of an orchestra - it does not play any instruments, but it decides who plays what and when.

+----------------------------------------------------------+
| ORCHESTRATION LAYER |
| |
| +----------+ +-----------+ +----------+ +--------+ |
| | Plan | | Execute | | Monitor | | Re-plan| |
| | Manager | | Engine | | & Eval | | Logic | |
| +----------+ +-----------+ +----------+ +--------+ |
| |
| Responsibilities: |
| - Break goals into executable steps |
| - Decide execution order and parallelism |
| - Route steps to the right tools or sub-agents |
| - Track progress and state |
| - Detect failures and trigger re-planning |
| - Manage context window usage |
| - Enforce guardrails and safety checks |
+----------------------------------------------------------+

In Google Cloud’s Vertex AI Agent Engine, the orchestration layer handles:

  • Routing: Directing user requests to the right agent or tool
  • State management: Tracking where the agent is in its plan
  • Tool execution: Calling tools and processing results
  • Error handling: Catching failures and deciding how to recover
  • Context management: Keeping the context window organized

The Agent Development Kit (ADK) gives you building blocks for customizing orchestration behavior. You define the agent’s tools, instructions, and behavior - the framework handles the execution loop.

DecisionOptionsTrade-off
How much to plan upfrontFull plan vs. next step onlyThoroughness vs. flexibility
When to re-planAfter every step vs. only on failureAdaptability vs. overhead
How to handle failureRetry, skip, abort, or re-planResilience vs. cost
Sequential vs. parallelOne step at a time vs. concurrentSimplicity vs. speed
How much reasoning to showInternal only vs. exposed to userTransparency vs. noise

Even well-designed planning agents can fail in predictable ways. Knowing these failure modes helps you build defenses against them.

What happens: The agent gets stuck repeating the same action or re-planning endlessly without making progress.

Example:

Think: "I need to find the user's email. Let me search the database."
Act: search_database(query="user email")
Observe: No results found.
Think: "I need to find the user's email. Let me search the database."
Act: search_database(query="user email")
Observe: No results found.
[repeats forever]

Prevention:

  • Set a maximum iteration count for any loop
  • Track actions taken and detect repetition
  • After N failed attempts at the same action, force the agent to try a different approach
  • Include a “give up gracefully” option - ask the user for help

What happens: The agent gradually wanders away from the original goal, following interesting tangents instead of staying on track.

Example:

Original goal: "Write a summary of Q4 sales performance"
Step 1: Fetch Q4 sales data [on track]
Step 2: Notice an anomaly in November data [slightly off track]
Step 3: Deep dive into November anomaly [drifting]
Step 4: Research industry trends that might explain the anomaly [lost]
Step 5: Write a report about industry trends [completely off track]

Prevention:

  • Periodically check: “Is what I am doing aligned with the original goal?”
  • Include the original goal in every reasoning step
  • Set scope boundaries in the plan
  • Use a separate evaluation step to check relevance

What happens: The agent spends more time planning than it would take to just do the task.

Example:

User: "What is 2 + 2?"
Agent Plan:
Step 1: Identify the mathematical operation (addition)
Step 2: Identify the operands (2 and 2)
Step 3: Verify the operands are valid numbers
Step 4: Perform the addition
Step 5: Verify the result
Step 6: Format the response
[Just say "4"!]

Prevention:

  • Estimate task complexity before deciding to plan
  • Simple tasks (single-step, clear answer) should skip planning entirely
  • Set a planning time budget proportional to task complexity

What happens: One failed step causes subsequent steps to fail in a chain reaction, and the agent does not detect the root cause.

Example:

Step 1: Fetch user profile -> Returns error (auth expired)
Step 2: Process user preferences -> Fails (no profile data)
Step 3: Generate recommendations -> Fails (no preferences)
Step 4: Format output -> Fails (no recommendations)
Agent: "I was unable to complete the task due to formatting errors."
[Wrong! The real problem was expired authentication in step 1]

Prevention:

  • Treat each step’s output as input validation for the next step
  • On failure, trace back to find the root cause
  • Fail fast on critical dependencies rather than trying to proceed without them
  • Report the actual root cause, not just the latest symptom

What happens: The plan and its execution history fill up the context window, leaving no room for the agent to reason about the current step.

Prevention:

  • Summarize completed steps instead of keeping full details
  • Store intermediate results externally and reference them by ID
  • Budget context space: reserve a fixed amount for reasoning about the current step
  • See Lesson 5: Memory and Context for detailed strategies
Failure modeSymptomKey prevention
Infinite loopsSame action repeatedMax iteration count + repetition detection
Plan driftAgent wanders off topicPeriodic goal alignment check
Over-planningExcessive planning for simple tasksComplexity estimation before planning
Cascading failuresWrong root cause reportedTrace back on failure + fail fast
Context exhaustionAgent loses coherenceContext budgeting + summarization

Here is a condensed example showing how planning, reasoning, and re-planning combine:

User: "Analyze our error logs from the past week and create Jira tickets
for the top issues."
PLAN:
Phase 1: Query error logs (past 7 days, ERROR + FATAL severity)
Phase 2: Group by type, rank by frequency, identify top 5
Phase 3: For each issue, check for existing tickets, create if new
Phase 4: Summarize findings
EXECUTE (ReAct within each phase):
Act: log_query(severity=["ERROR","FATAL"], days=7)
Observe: 12,847 entries returned
Act: log_analyze(group_by="error_message", sort="count_desc")
Observe: Top issues identified (connection timeouts, null pointer, rate limits)
Act: jira_search(query="Connection timeout payment-service")
Observe: No existing ticket -> Create new ticket PLATFORM-1234
RE-PLAN (mid-execution):
Observe: PLATFORM-999 already exists for NullPointerException
Adjust: Update existing ticket with new occurrence count instead of creating duplicate
DELIVER:
"Analyzed 12,847 errors. Created 4 new tickets, updated 1 existing one."

Notice how the agent uses hierarchical planning upfront, ReAct-style execution within each phase, and re-planning when it discovers an existing ticket. All the patterns working together.


  1. Planning transforms complex tasks into manageable steps. Without it, agents either miss important steps or waste effort on tangents.

  2. The problem-solving loop (Mission, Scene, Think, Act, Observe) is the engine of every planning agent. Understanding this loop helps you design and debug agent behavior.

  3. Plan-then-execute and reactive approaches are two ends of a spectrum. Most practical agents use a hybrid: light upfront planning with reactive execution.

  4. Hierarchical planning handles complex tasks by breaking them into goals, sub-goals, tasks, and steps. This mirrors how project management works.

  5. Re-planning is essential. Plans will break. Good agents detect failures early and adjust. The ability to adapt separates useful agents from fragile ones.

  6. Reasoning techniques (CoT, ToT, self-consistency) make planning more effective. Chain-of-Thought for step-by-step logic, Tree-of-Thoughts for exploring alternatives, self-consistency for validating critical decisions.

  7. Watch out for common failure modes. Infinite loops, plan drift, over-planning, cascading failures, and context exhaustion are predictable problems with known solutions.

  8. The orchestration layer ties it all together. It manages the plan lifecycle, routes execution, handles errors, and keeps the agent on track.



Next lesson: Multi-Agent Systems