What does each of ReAct's three steps handle, and what breaks if one is missing?
The division of labor is precise. Thought is pure text reasoning by the LLM — no tool calls, just thinking: what is my goal right now, what tools do I have, what should I do next, and why. This step makes the decision process visible and traceable.
Action is the concrete execution following Thought — calling an API, reading on-chain data, searching the web, or in crypto contexts, signing a transaction. This is where the Agent actually interacts with the external world and generates real results.
Observation reads the Action's result back into the LLM — 'what did the API return,' 'what did the tool say' — as input for the next Thought round. Without Observation, the Agent ignores the actual tool output and continues reasoning from training-data hallucination.
Consequences of a missing step: no Thought means the Agent becomes a blind execution script that can't handle unexpected situations. No Action means it's just a thinking machine that never does anything. No Observation means each Thought round stacks hallucination on hallucination, drifting further off course. All three together form the closed loop.
What's the difference between ReAct and Chain-of-Thought (CoT)? Why isn't CoT alone enough?
Chain-of-Thought (CoT) prompts the LLM to write out its reasoning step-by-step before giving a final answer ('let me think through this…'). It significantly improves answer quality on complex problems, but has a fundamental limitation: CoT can only reason from knowledge in the training data — it cannot access any real-time information.
A concrete comparison: you ask 'is on-chain capital flowing into ETH positive or negative right now?' CoT can only answer from pre-training-cutoff knowledge (and will guess — often confidently and wrongly). A ReAct Agent decides in Thought that 'I need to check real-time on-chain data,' actually checks it in Action, reads the real result in Observation, then makes a judgment.
Simply put: CoT makes the LLM think more clearly; ReAct lets the Agent think clearly and then go verify and execute. In crypto, almost every valuable judgment requires real-time data — market prices, on-chain state, protocol rates — which CoT alone simply can't provide. ReAct extends CoT; it doesn't replace it.
What are the common causes of ReAct loops going off the rails, and how do you prevent them architecturally?
Three common failure scenarios: First, infinite loops: Thought repeatedly decides 'I still need more information,' keeps firing Actions until the token budget runs out. This usually happens when the task goal is vague ('analyze the market' triggers it more than 'check ETH's 24-hour price change and give a one-sentence conclusion') or when tool outputs are in formats the Agent can't parse. Second, tool contamination: the tool called in Action returns wrong or maliciously injected data; Observation reads it in; Thought continues reasoning from bad information and drifts further. This is the core mechanism of malicious MCP Server attacks. Third, first-step assumption errors compounding: Thought makes a wrong assumption in round one; every subsequent Action and Observation reinforces that error; the final output looks logically rigorous but is completely wrong.
Architectural prevention: set a maximum loop count (typically 5–15) with forced termination; validate Observation data for plausibility (trigger alerts when values are outside historical ranges); add a human-confirmation gate for critical Actions (especially transaction signing); only authorize tools from an audited whitelist.
When deploying a ReAct Agent in production, what are the critical elements of the Thought-step prompt design?
The quality of the Thought step almost entirely determines the quality of the whole Agent's reasoning — and that quality depends heavily on how the System Prompt is designed. Key elements: First, give the Agent a clear tool list and descriptions. Each tool's name, purpose, and parameter format must be spelled out clearly. The Thought step reads these to decide which tool to call — vague descriptions lead to wrong tool choices. Second, give the Agent an explicit termination condition. Tell it 'when you have enough information to answer the question, output Final Answer' — otherwise the Agent may forever decide 'I should check one more time.' Third, set action boundaries. Explicitly state what the Agent can do (query data, analyze markets) and what it cannot (do not sign transactions above $X without confirmation). Fourth, provide a Thought/Action/Observation example in the System Prompt so the Agent knows the correct output format. Inconsistent formatting breaks the backend parsing logic.
These design details determine whether a ReAct Agent actually works in real scenarios, not just in demos.
ReAct Loop in Practice (Crypto Scenario)
Task: 'Which gained more today — ETH or BTC? Give me one conclusion.'
Round 1: Thought: I need the 24-hour price change for ETH and BTC separately. Check ETH first. Action: call get_price_change(token='ETH', period='24h'). Observation: ETH 24h change +3.2%.
Round 2: Thought: ETH up 3.2%. Now check BTC. Action: call get_price_change(token='BTC', period='24h'). Observation: BTC 24h change +1.8%.
Round 3: Thought: ETH +3.2%, BTC +1.8%. ETH gained more today. I have enough information to answer. Final Answer: ETH (+3.2%) outperformed BTC (+1.8%) today.
Key feature of the whole process: every reasoning step is explicitly recorded in Thought; every tool call and returned result is logged in Action/Observation and auditable after the fact. If the final answer is wrong (e.g., the API returned bad data), you can trace the problem directly to the Observation. That traceability is the core value ReAct delivers.
ReAct's core tradeoff is transparency vs. speed. The Thought step makes every decision fully visible and traceable — an advantage for scenarios requiring audit (on-chain asset management, compliance). But every Thought round requires a billed LLM call: more loops mean higher cost and slower execution. Compared to a direct tool chain (pure Action sequences without Thought): faster, cheaper, but unable to adapt to unexpected situations. ReAct suits complex tasks needing dynamic judgment and explainability. Pure tool chains suit fixed-step, no-reasoning simple automation. Most high-value crypto Agent tasks fall into the former; routine small repetitive operations can use the latter to save cost.