Bible Network Crypto DeFi Onchain RWA AI Agent Stablecoin Chain SAFU CryptoTax DeFAI AGI Claude Me Claude Skill Claude Design Claude Cowork
Independent Media
Not affiliated with any project
Deconstructing Autonomous Agents in Crypto
aiagent-bible.com
LATEST
Onchain Agent Worst-Case Defense Design: If Your Agent Is Fully Compromised, How to Keep Losses Within Acceptable Range  ·  How to Choose a Crypto AI Agent Service: Five Evaluation Frameworks to Avoid Marketing Traps  ·  Crypto Agent Pre-Launch Security Checklist: 12 Mandatory Items from Testnet to Mainnet  ·  How to Design an Agent Wallet: Complete Risk and Cost Comparison of Four Architectures  ·  AutoGen vs LangChain vs ElizaOS: Which Framework to Choose — A Complete Decision Guide for Crypto AI Agent Developers  ·  Agent Memory System Design: Three-Layer Architecture of Short-Term, Long-Term, and Semantic Retrieval, and Security Boundaries for Crypto Contexts
Glossary · Agent Architecture & Reasoning

Context Window

Agent Architecture & Reasoning 新手

Full Explanation +
01 · What is this?

What is a Token? How many tokens do Chinese, English, and code each 'consume'?

A Token is the basic unit that LLMs use to process text — not exactly a 'character,' not exactly a 'word,' but the segments the model's tokenizer cuts text into. Token efficiency varies dramatically across languages, directly affecting your API costs and Context Window utilization.

English has the highest token efficiency: English averages ~4 characters per token; a common English word is typically 1 token (the, agent, wallet), longer words may be 2-3 tokens (cryptocurrency). English: ~750 words ≈ 1,000 tokens.

Chinese has lower token efficiency: Each Chinese character is typically 1-2 tokens (Claude's tokenizer compresses Chinese less efficiently than English). Traditional Chinese: ~500-600 characters ≈ 1,000 tokens. (The same information written in Chinese consumes roughly 1.3-1.5× the tokens of English.)

Code consumes the most tokens: Indentation, brackets, quotes, semicolons — each symbol occupies a token; long function names consume more tokens. 100 lines of Python may consume 500-800 tokens, far exceeding the same character count in English prose.

SVG consumes a surprising amount: SVG code (tags like <rect x="30" y="40" width="100" fill="#333">) is very unfriendly to tokenizers — each attribute value, quote, and coordinate is an independent token. A moderately complex SVG may consume 2,000-5,000 tokens. If Agent tool returns include SVG, Context Window fills very quickly.

Practical implication: In Agent tool return design, minimize returned token count — return only fields the Agent needs, not the entire API response.

02 · Why does it exist?

What happens when the Context Window fills up? What are the handling strategies?

When the Context Window approaches its limit, models have different handling strategies — all imperfect — which is why 'managing the Context Window' is one of the core challenges of Agent engineering.

Hard Truncation: The simplest but worst approach — directly cut the earliest Context content, keeping only the most recent N tokens for the model. Consequence: the Agent 'forgets' early conversation and decision context, may repeat operations already performed, or forget important constraints set early on ('don't touch Aave today, it's upgrading').

Sliding Window Summarization: When Context usage exceeds 70%, automatically trigger summary compression — compress the earliest N rounds of conversation into a high-density summary using an LLM, then replace the original conversation with the summary. Loss: details are compressed, but key decision points are retained. Suitable for: long-running DeFi Agents that only need to remember decision outcomes, not every intermediate step.

RAG Externalization (Retrieval-Augmented Generation): Move long-term information out of the Context Window into a vector database; retrieve only the 'K most relevant chunks' into Context before each inference. Context contains only information relevant to the current task, not full history. Most flexible approach, but introduces vector search latency and engineering complexity.

Task Decomposition: Split a long task requiring extensive Context into multiple independent short tasks, each with its own clean Context. An Orchestrator coordinates the inputs/outputs of sub-tasks, preventing any single task's Context from growing too long. This is one of the design advantages of Multi-Agent systems.

Practical recommendation for DeFi Agents: The most common approach is 'sliding window summarization + structured DB long-term memory' — short-term Context management via summarization; long-term decision records stored in PostgreSQL; not fully relying on Context Window to retain history.

03 · How does it affect your decisions?

How do Context Window sizes differ across Claude, GPT-4o, and Gemini? How should this factor into model selection?

Context Windows for mainstream models (as of mid-2026):

Claude Sonnet (Anthropic): 200K tokens ≈ ~150,000 English words or ~100,000 Chinese characters. Sufficient for most DeFi Agent tasks — can fit hours of operation history + current task + tool returns in one Context.

GPT-4o (OpenAI): 128K tokens. Smaller than Claude; long-running Agents hit the limit more easily, requiring more aggressive Context management strategies.

Gemini 1.5 Pro (Google): 1M tokens — the largest among mainstream models. Theoretically can fit an entire DeFi protocol codebase for analysis, but more tokens means higher API costs, and the 'needle in a haystack problem' (finding the most critical few lines in 1M tokens) remains an unsolved engineering challenge.

Context Window considerations for model selection:

Bigger is not always better. If your Agent task uses 20K-50K tokens per Context, upgrading from 200K to 1M makes almost no difference to Agent performance but may significantly increase API costs (typically token-based pricing, with larger context versions costing more per token).

Scenarios genuinely needing a very large Context: one-time analysis of large codebases or lengthy documents (not a continuously running Agent but a one-time analysis task); cross-document analysis requiring comparison of large volumes of documents in one Context.

For continuously running DeFi Agents: 200K tokens (Claude Sonnet) is usually sufficient; combined with sliding window summarization, it can extend indefinitely. Prioritize the model's reasoning capability and Tool Use support quality over Context Window size.

04 · What should you do?

Is 'larger Context Window = smarter Agent' true? What is the relationship between Context Window size and Agent performance?

This is one of the most common misconceptions. The relationship between Context Window size and Agent performance is more complex than most people think:

What Context Window size actually affects: whether a complete set of task-relevant information can fit in one Context (long documents, long conversation history); the amount of information an Agent 'remembers' within a single conversation; the feasibility of one-time large-scale analysis tasks (analyzing a complete contract codebase).

What Context Window size does NOT affect: the model's reasoning depth (quality of reasoning on complex problems) — reasoning quality is primarily determined by the model's training, not its Context Window size. A 200K-token Claude Sonnet may reason better about complex multi-step DeFi strategy problems than a 1M-token weaker model, even though the latter can 'see' more information. The model's Tool Use accuracy (precision of function call outputs); the model's resistance to Prompt Injection (larger Contexts may actually increase attack surface — more external data can inject more malicious instructions).

The 'Lost in the Middle' problem: Research finds that when Context Windows contain large amounts of text, models tend to pay attention to information at the beginning and end of Context, neglecting information in the middle — even when critical information is there. Simply increasing Context Window size doesn't guarantee the model 'better utilizes' more information.

Practical recommendation: When selecting models, prioritize reasoning capability, Tool Use support quality, and pricing over Context Window size. Context Window being 'good enough' is sufficient (most Agents need 128K-200K); don't choose a weaker reasoning model just for a larger Context Window.

Real-World Example +

Concrete calculation: how many tokens does a DeFi Agent consume in one inference cycle?

Using an Agent executing USDC yield optimization on Base chain as an example, calculating token consumption for one complete inference cycle:

Input Context (what the Agent sees before reasoning):

  • System Prompt (strategy rules + tool descriptions): ~2,000 tokens
  • Summary of past 3 operations (retrieved from PostgreSQL): ~800 tokens
  • Current task instruction: ~100 tokens
  • Tool call 1 return: Aave rate data (trimmed to necessary fields): ~300 tokens
  • Tool call 2 return: Morpho rate data (trimmed): ~300 tokens
  • Tool call 3 return: Gas fee estimate: ~150 tokens
  • Input subtotal: ~3,650 tokens

Output (Agent reasoning + decision output):

  • Reasoning process (Thought steps): ~400 tokens
  • Tool call instruction output: ~150 tokens
  • Output subtotal: ~550 tokens

Single inference cycle total: ~4,200 tokens

Converting to API cost (Claude Sonnet, mid-2026 pricing):

  • Input tokens: ~$0.003/1K × 3.65K = ~$0.011
  • Output tokens: ~$0.015/1K × 0.55K = ~$0.008
  • Each inference ~$0.019 (under 2 cents)

Converting to Context Window utilization: 4,200 / 200,000 = 2.1%. If the Agent runs 10 inferences per hour, 8 hours consumes only 16.8% — showing that with effective tool return trimming, 200K Context Window is more than sufficient for DeFi Agents, without needing frequent summarization compression.

Warning: Without tool return trimming, passing an entire Aave API response to the Agent (possibly 5,000-10,000 tokens), the same 8-hour session may consume 40-80% of the Context Window, with API costs 10-20× higher.

Diagram
Context Window: Token Space Allocation in a DeFi AgentContext Window Token 分配圖:以 DeFi Agent 一次完整推理為例,展示 System Prompt、操作記錄、工具回傳、推理輸出各自佔用的 Token 比例,以及 Context Window 滿了之後的三種處理策略。Context Window: Token Allocation (DeFi Agent Example)200,000 Token Context Window (Claude Sonnet)2.1%Used: ~4,200Free: ~195,800Single Inference Token Breakdown (~4,200 total)System Prompt 2,000History 800Tools 750TaskOutWhen Context Window Fills Up: Three StrategiesStrategy 1Sliding WindowSummarizationCompress old turns → summaryKeep decision outcomesStrategy 2RAG ExternalizationMove history → Vector DBRetrieve top-K chunks onlyContext stays leanStrategy 3Task DecompositionSplit into sub-AgentsEach with clean ContextOrchestrator coordinatesCost Impact: Tool Return TrimmingWith trimming: ~4,200 tokens/inference · ~$0.02 · 100/day = $2/dayWithout trimming: ~15,000 tokens/inference · ~$0.08 · 100/day = $8/dayAI Agent Bible · aiagent-bible.com
Feel free to share. Please credit the source.
Common Misconceptions +
✕ Misconception 1
× Misconception 1: A larger Context Window makes an AI Agent smarter. Context Window size affects how much information the Agent 'can see,' not how deep its reasoning goes. Reasoning quality is determined by model training. A 200K-token Claude Sonnet may reason better about complex strategy problems than a 1M-token weaker model. When selecting models, prioritize reasoning capability — Context Window being 'sufficient' is enough.
✕ Misconception 2
× Misconception 2: When the Context Window fills up, just switch to a model with a larger Context Window. Upgrading to a larger Context model solves 'fitting more information in' — it doesn't solve the root cause of 'Agents becoming more incoherent the longer they run,' which is a long-term memory system design problem (requiring PostgreSQL + vector DB), not a Context Window size problem.
The Missing Link +
Direct Impact

Larger Context Window → can process more information simultaneously, suitable for one-time large-scale analysis tasks; but models with larger Contexts typically cost more per token, and the 'Lost in the Middle' problem exists (model attention decreases for information in the middle of Context). Smaller Context Window → forces the Agent to more actively manage information (selecting only the most important information for Context), sometimes making Agent decisions more focused; lower cost. For continuously running Agents: 'sufficient' Context Window is ideal (128K-200K is usually enough); combined with sliding window summarization, this extends indefinitely — no need to upgrade to a weaker reasoning model just for a larger Context Window.

Ask a Question
Please enter at least 10 characters