fundamentals

How AI Agents Think: A Complete Breakdown of the ReAct Reasoning Framework and Why It Determines Whether Agents Can Actually Get Things Done

30-Second Version · For the impatient

ReAct stops AI Agents from making decisions based on hallucination — every Thought, every Action, every Observation leaves a trace. Learning to read those three steps is how you tell whether an Agent is genuinely reasoning or making mistakes that look logical.

Alex Mercer · June 15, 2026

Full Explanation +

01 · Why did this happen?

The ReAct framework was only proposed in 2022 — how did AI Agents work before it, and why was ReAct a breakthrough?

Before ReAct, AI Agent design fell into roughly two camps. The first was "pure reasoning" type: using a language model to do all the reasoning, but it could only output text — it could tell you "you should buy ETH" but couldn't actually place the order. The second was "pure tool" type: driven by rules or scripts, the Agent called tools and executed operations according to preset logic, but it didn't "think" about why to do it and couldn't flexibly adjust when tools failed or context changed.

ReAct's breakthrough was fusing "language model reasoning capability" and "tool execution capability" in the same loop. The Thought step lets the LLM first think through why to do something and what to do next; the Action step makes it actually execute; the Observation step feeds real execution results back to the LLM for continued reasoning — rather than using hallucination to fill in unknowns.

This design gave Agents the ability to "dynamically adjust based on real-world feedback" for the first time, rather than blindly following a script to the end. For crypto contexts, this means an Agent can re-evaluate its entire plan when market conditions suddenly change (e.g., a large on-chain liquidation appears), rather than continuing to execute a now-inappropriate preset action.

02 · What is the mechanism?

Each ReAct loop iteration costs money (token fees) — how do you control costs in actual crypto Agent deployment?

This is a question frequently overlooked in documentation but critically important in real deployment. Each ReAct Thought step requires an LLM call, billed by model and token usage. A complex crypto task (e.g., "evaluate whether to rebalance now by checking five indicators") might run 10 loop iterations, each 2,000–3,000 tokens. Using a GPT-4 tier model, one task execution could cost $0.30–$0.80. That sounds small, but if an Agent runs tasks every minute or a poorly designed task triggers infinite loops, monthly costs could spike into the hundreds or thousands of dollars.

Three mainstream methods for controlling costs in practice: First, set a maximum loop count (usually 5–15) — when exceeded, forcibly terminate the task and report "insufficient information, requires human intervention." Second, use tiered model selection: use cheaper models for Thought steps (like Claude Haiku or GPT-4o-mini), only using expensive models for high-quality final decision-making. Third, cache common tool results: if ETH price was already queried within 30 seconds, the Observation uses the cached result rather than re-calling the API — reducing tool call count indirectly reduces loop requirements.

Cost control in crypto Agents is a critical part of system design — not something to think about after going live.

03 · How does it affect me?

If an Agent's Thought step is contaminated by a malicious tool (MCP Server attack), how do I know it's being manipulated?

This is one of the most important security threats to watch for in crypto AI Agents. The mechanism of a malicious MCP Server attack: injecting false information into tool response data in the Observation result, causing the Agent's Thought step to continue reasoning on incorrect "facts," ultimately leading to decisions the attacker wants. For example: a tool querying on-chain prices, after being compromised, returns "ETH current price $500" (actual: $3,400). The Agent's Thought step reads this, calculates "this is a historical low, strong buy signal," and autonomously signs a large buy transaction.

Signals that your Agent may be being manipulated: decision results severely inconsistent with actual market conditions (e.g., the Agent continuously places buy orders during an obvious downtrend); Observation step data that doesn't match what you yourself can look up; Thought steps heading in reasoning directions unrelated to your set goals (e.g., your goal is conservative position holding, but the Agent starts reasoning justifications for high-risk operations).

Defense methods: first, only authorize MCP Servers you've audited with verifiable, trustworthy sources; second, add an independent data validation step before critical tool calls (sign_tx) that confirms the tool's returned value is plausible using another data source; third, set anomaly behavior alerts — if an Agent's Action deviates significantly from historical patterns, immediately pause and notify you.

04 · What should I do?

What are the differences in ReAct implementation between major crypto Agent frameworks (ElizaOS, LangChain, AutoGen)? How do I choose?

All three frameworks are based on the ReAct core loop, but their suitability for crypto contexts differs significantly.

LangChain / LangGraph: the most mature general Agent framework, with a rich tool ecosystem and many ready-made DeFi data connectors (Coingecko, The Graph, DEX APIs). LangGraph's graph-based workflow design lets you precisely control each step's logical branches, ideal for trading strategies requiring complex conditional logic. Downside: the framework itself is relatively heavy, with a steep learning curve unfriendly to those without engineering backgrounds. Best for: developers with coding ability who need highly customizable crypto trading Agents.

AutoGen: developed by Microsoft, its strength is multi-Agent collaboration — multiple Agents converse, divide work, and cross-validate each other's reasoning. Crypto application: one Agent handles technical analysis, another handles sentiment analysis, a third handles risk assessment, and they debate before outputting a final decision. Best for: complex multi-factor judgment tasks — more redundancy and validation than single-Agent setups.

ElizaOS: crypto-native, developed by the ai16z community, with built-in native support for social platforms (Twitter/Farcaster) and on-chain wallets. Better suited for "social Agent + on-chain operations" hybrid scenarios. Tool ecosystem less rich than LangChain, but crypto-directed integrations are deeper. Best for: deploying crypto social Agents or scenarios requiring integration with the ai16z ecosystem.

Selection principle: if your core need is trading strategy and on-chain operations, look at LangChain first; if you need multi-Agent cross-validation, look at AutoGen; if it's crypto community + social Agent, look at ElizaOS.

Full Content +

Most people's first question about AI Agents is 'what can it do?' But the more important question is 'how does it decide what to do?' When a crypto trading Agent detects a market anomaly at 3am and moves from 'noticed some data' to 'signed a transaction,' what happened in between? The answer almost always points to the same framework: ReAct.

ReAct was proposed by a Google research team in 2022. The name stands for 'Reasoning + Acting.' It solved a core flaw in early AI systems — some could think but not act (pure reasoning models), others could act but not think (pure tool-calling scripts). ReAct chains the two together into a 'think → act → observe → think again' loop. This is now the reasoning core of virtually every major AI Agent framework: LangChain, AutoGen, ElizaOS.

The ReAct Loop: What Each Step Does

Let's use a crypto scenario. Your task for the Agent: 'Is ETH worth buying right now? Check the current price and market sentiment before deciding.'

The Agent doesn't answer immediately. It runs this loop:

Step 1 — Thought: 'I need to know ETH's current price and a sentiment indicator. I have two tools: a price API and a sentiment analysis tool. I'll check price first.' This step is pure text reasoning — the Agent explains to itself what it needs to do next and why.

Step 2 — Action: Calls get_price(token='ETH'). This is where the Agent actually does something — calls an API, reads on-chain data, or in crypto contexts, signs a query transaction.

Step 3 — Observation: The tool returns a result: 'ETH current price $3,420, up 4.2% in the past 24 hours.' The Agent reads this in as input for the next round of reasoning.

Back to Thought: 'Price is a bit elevated, up 4%. I still need to check sentiment. Calling sentiment tool.' Then Action → Observation → Thought again… until the Agent decides it has enough information to give a final answer.

The critical thing about this loop: every step of reasoning is explicit and traceable. You can see line by line why the Agent made each decision, which step it misjudged, which tool returned garbage. In crypto this matters enormously — if an Agent autonomously executed a trade that lost you money, you need to find exactly which Thought step went wrong.

How ReAct Differs from Just Asking ChatGPT

If you ask ChatGPT 'is ETH worth buying,' it gives you a plausible-sounding answer based on training data — but its 'ETH price' might be months old, and its sentiment analysis is its own guess, not real-time data.

A ReAct Agent is different because it knows what it doesn't know and actively goes to find out. The Thought step makes the Agent plan first — 'what information do I need, what tools can get it' — then Action actually retrieves it, Observation reads in the real result, then reasoning proceeds. The whole process is grounded in real-time data, not training-data hallucination.

In crypto this difference is the difference between making money and losing it. A hallucination-based Agent might buy during a market panic because it 'remembers' ETH is bullish long-term. A ReAct Agent checks the current Fear and Greed Index, on-chain fund flows, then decides.

When ReAct Fails: How It Makes Bad Decisions

Understanding ReAct's failure modes is as important as understanding its capabilities — especially if you're letting an Agent manage on-chain assets.

Tools return garbage; the Agent believes it. ReAct's Observation step assumes tool output is trustworthy. If your DEX price API returns an anomalous price during low liquidity, the Agent may accept it at face value. Malicious MCP Server attacks exploit exactly this — injecting false information into tool responses to corrupt the Thought step.

Too many loop iterations burn your token budget. Each ReAct loop requires an LLM inference call, billed by tokens. A poorly designed task can trap an Agent in an 'I need more information' loop until the token budget runs out. In autonomous fund management scenarios, this can cause operations to stall and costs to spiral.

Reasoning errors in Thought compound. If the Agent makes a wrong assumption in the first Thought step — say, interpreting 'ETH up 4%' as a strong buy signal without checking whether the whole market rose equally — every subsequent Action and Observation builds on that bad assumption. The result can look rigorously logical while being completely wrong.

How Crypto Agents Implement ReAct in Practice

In crypto-native Agent frameworks like ElizaOS and LangChain, ReAct is typically implemented this way: First, give the Agent a toolbox containing every tool it can call — DEX price queries, on-chain data APIs, wallet balance checks, transaction signing functions. Each tool has a clear description; the Agent's Thought step reads these to decide which to call. Second, set a maximum loop count (say, 10 rounds) to prevent infinite loops. Third, implement a permission layer for tool calls: read-only tools (price queries, on-chain data) can be called freely; write tools (signing transactions, moving funds) require an additional confirmation gate or a transaction size cap.

This architecture lets you balance 'giving the Agent enough information to reason well' against 'not letting the Agent move your funds without guardrails.'

What This Means for Your Money

If you plan to use or deploy any crypto AI Agent, understanding the ReAct framework has three direct implications. First, you can read the Agent's decision log — almost all ReAct-based Agents output Thought/Action/Observation records. Learning to read these lets you judge whether the Agent is genuinely reasoning or confabulating. Second, you know where to audit a bad trade — if the Agent made a losing transaction, the first thing to do is find the Thought step, not blame the model. Was the assumption wrong? Did a tool return bad data? Did the reasoning chain break down somewhere in the middle? Third, you know how to set meaningful tool permissions. How much the Action step can do depends entirely on which tools you gave it and where their limits are. Understanding that is how you build a system that's smart enough to be useful and constrained enough not to blow up.

Diagram

Feel free to share. Please credit the source.

Ask a Question

Related Terms