Most people's understanding of AI Agent 'planning' stays at a vague concept: Agents can 'autonomously plan' complex tasks. But this description masks a practical question: how does LLM planning actually work, when is it effective, and when does it fail?
Understanding LLM planning mechanisms is especially important for Onchain Agent design — DeFi strategy Agent planning failures don't just mean 'task not completed,' they can mean executing incorrect on-chain operations. This article systematically analyzes four Agent planning strategies, the applicable scenarios and failure modes of each, and how to make planning more reliable at the system design level.
Planning in AI Agents is defined as: given a goal and current state, the Agent generates 'a sequence of actions to reach the goal.' In concrete Onchain Agent scenarios, the planning problem is: the user says 'find the current best USDC yield strategy and execute it,' and the Agent needs to plan the action sequence 'query APY across protocols → compare spreads → judge whether rebalancing is worthwhile → calculate Gas fee payback period → decide execute or wait.'
Planning's difficulty: when a plan is generated, the Agent doesn't know the execution results of each step (before querying APY, it doesn't know the spread); and the environment is dynamic (APY may change during plan execution). The core challenge of LLM planning: how to generate reasonable action sequences under uncertainty, and how to adjust plans based on new information during execution.
Planning problem complexity determines what planning strategy is needed. For simple linear tasks (do B after A completes, do C after B completes), basic ReAct loop is sufficient; for complex non-linear tasks (multiple sub-goals, conditional branches, possible backtracking), more structured planning mechanisms are needed.
Strategy 1: Reactive Planning (ReAct)
The most basic Agent planning approach — no complete plan generated in advance; instead, in each step the LLM decides the next action based on current state. Each cycle: Thought (reasoning: what's the current state, what should the next step be) → Action (execute a tool call) → Observation (observe tool return) → Thought again. Strengths: strong adaptability to environmental changes — each action is based on the latest observation, not constrained by a fixed plan. Suitable scenarios: tasks with uncertain step counts or hard-to-predict intermediate results (e.g., 'analyze this protocol's contract and find potential security risks'). Limitation: lacks global perspective, prone to 'myopic decisions' — each step seems reasonable but the overall sequence may go wrong. For strategies needing global optimization (e.g., 'execute multi-step rebalancing at the lowest Gas fee time'), pure reactive planning is inefficient.
Strategy 2: Plan-then-Execute
Have the LLM generate a complete plan (task decomposition + step sequence) before executing any tool calls, then execute each planned step sequentially. System Prompt design: have the LLM output a 'plan checklist' in the first Thought step, explicitly listing all subsequent steps, then begin execution. Strengths: better global perspective, can optimize at the plan level (e.g., schedule steps requiring Gas fee lows at the end of the plan); can detect plan contradictions in advance. Limitation: LLM-generated plans are based on state at reasoning start — if the environment changes mid-execution (APY fluctuates sharply), the fixed plan may no longer apply. Suitable scenarios: tasks with relatively fixed step counts and order, infrequent environment changes (e.g., one-time multi-step protocol migrations).
Strategy 3: Hierarchical Planning
Decompose complex tasks into 'high-level plan' and 'low-level plan' layers. High-level Planner (usually a more capable LLM) handles task decomposition; low-level Executor (cheaper LLM or deterministic code) handles executing concrete steps for each sub-goal. Core advantages: parallel execution between sub-goals; high-level Planner only receives sub-goal 'complete/fail/result' feedback without knowing low-level execution details, reducing high-level LLM Context complexity. This is the architectural foundation of multi-Agent systems (Orchestrator + Sub-agents).
Strategy 4: Tree-of-Thought Planning (Exploratory Planning)
Have the LLM simultaneously explore multiple possible plan branches, evaluate each branch's feasibility, and select the optimal plan path for execution. Analogy: in chess, consider multiple possible moves simultaneously, predict consequences of each, select the best. For DeFi Agent applications: given the goal 'adjust DeFi portfolio,' Tree-of-Thought lets the LLM generate 3-5 different strategy options first, evaluates each option's expected yield and risk under current market conditions, and selects the optimal option to execute. Limitation: high computational cost (multiple LLM inferences). Suitable scenarios: high-value decisions (large capital, infrequent decisions).
Understanding how planning fails lets you prevent failures at the design level:
Failure 1: Plan Drift. The LLM's reasoning gradually deviates from the original plan objective as execution cycles increase. Manifestation: the operation the Agent executes at step 3 is inconsistent with the step-1 plan, but each step individually looks 'reasonable.' Cause: in long-Context Thought history, early plan details get diluted by later conversation content. Defense: before each reasoning cycle, inject 'current plan and completed steps' as structured Prompt into Context.
Failure 2: Plan Rigidity. The LLM strictly executes the initial plan even when intermediate Observations show the plan needs adjustment. Manifestation: tool returns show the target protocol's APY has dropped to an unprofitable level, but the Agent still executes the rebalancing per the original plan. Defense: design explicit 'replanning trigger conditions' — when Observations deviate from plan assumption prerequisites beyond a threshold, force replanning instead of continuing.
Failure 3: Subgoal Conflict. In hierarchical planning, sub-goals decomposed by the high-level Planner have implicit conflicts only discovered at low-level execution. Manifestation: Planner simultaneously assigns 'rebalance to Morpho' and 'maintain ETH collateral in Aave' sub-goals, but available capital is insufficient to complete both. Defense: add a 'sub-goal feasibility validation' step after Planner generates the plan but before assigning sub-goals.
Failure 4: Planning Hallucination. LLM cites non-existent tool capabilities or assumes invalid prerequisites when generating a plan. Manifestation: the plan includes 'call get_compound_v3_vault_apy tool,' but this tool doesn't exist in the actual tool list. Defense: after plan generation, have code-layer validate that all tool names in the plan are on the whitelist tool list; include the tool list explicitly in the System Prompt.
Dynamic replanning is the Agent's ability to update its plan based on new Observations during execution — the key mechanism for keeping Agents effective in dynamic environments (DeFi markets).
Dynamic replanning requires design at three levels: trigger conditions (what situations require replanning), replanning scope (local adjustment or full replan), and replanning cost control (avoiding frequent replanning causing LLM call cost explosions).
Trigger condition design: set 'plan prerequisite conditions' — when pre-planning, have the LLM simultaneously output 'list of prerequisites for this plan to hold' (e.g., 'prerequisite: Morpho APY >5% and Gas fee <$10'); during execution, automatically check after each Observation whether these prerequisites still hold; when any prerequisite no longer holds, trigger local replanning (only replan affected subsequent steps, not already-completed parts). Replanning scope: preserve already-completed step results, only replan 'from current step to completion'; use 'completed progress + current state + updated constraints' as replanning input. Cost control: set maximum replanning count (alert and pause after exceeding threshold, await manual confirmation), preventing Agents from entering infinite replanning loops due to excessive market volatility.
Planning strategy selection should match your Agent's task type, not default to the most complex strategy: for Agents executing fixed yield scanning strategies, ReAct is sufficient; for Agents needing cross-protocol multi-step allocation optimization, Plan-then-Execute with replanning triggers is more appropriate; only for 'high-value, complex decision, worth more LLM cost' scenarios should Tree-of-Thought or hierarchical planning be considered.
The most easily overlooked planning design detail: plan prerequisite condition recording and validation. When pre-planning, have the LLM simultaneously output 'assumptions this plan depends on' and continuously validate these assumptions during execution — this design changes 'plan rigidity' from 'silently executing the wrong plan' to 'correctly triggering replanning after detecting assumption failure.'