What is a Token? How many tokens do Chinese, English, and code each 'consume'?
A Token is the basic unit that LLMs use to process text — not exactly a 'character,' not exactly a 'word,' but the segments the model's tokenizer cuts text into. Token efficiency varies dramatically across languages, directly affecting your API costs and Context Window utilization.
English has the highest token efficiency: English averages ~4 characters per token; a common English word is typically 1 token (the, agent, wallet), longer words may be 2-3 tokens (cryptocurrency). English: ~750 words ≈ 1,000 tokens.
Chinese has lower token efficiency: Each Chinese character is typically 1-2 tokens (Claude's tokenizer compresses Chinese less efficiently than English). Traditional Chinese: ~500-600 characters ≈ 1,000 tokens. (The same information written in Chinese consumes roughly 1.3-1.5× the tokens of English.)
Code consumes the most tokens: Indentation, brackets, quotes, semicolons — each symbol occupies a token; long function names consume more tokens. 100 lines of Python may consume 500-800 tokens, far exceeding the same character count in English prose.
SVG consumes a surprising amount: SVG code (tags like <rect x="30" y="40" width="100" fill="#333">) is very unfriendly to tokenizers — each attribute value, quote, and coordinate is an independent token. A moderately complex SVG may consume 2,000-5,000 tokens. If Agent tool returns include SVG, Context Window fills very quickly.
Practical implication: In Agent tool return design, minimize returned token count — return only fields the Agent needs, not the entire API response.
What happens when the Context Window fills up? What are the handling strategies?
When the Context Window approaches its limit, models have different handling strategies — all imperfect — which is why 'managing the Context Window' is one of the core challenges of Agent engineering.
Hard Truncation: The simplest but worst approach — directly cut the earliest Context content, keeping only the most recent N tokens for the model. Consequence: the Agent 'forgets' early conversation and decision context, may repeat operations already performed, or forget important constraints set early on ('don't touch Aave today, it's upgrading').
Sliding Window Summarization: When Context usage exceeds 70%, automatically trigger summary compression — compress the earliest N rounds of conversation into a high-density summary using an LLM, then replace the original conversation with the summary. Loss: details are compressed, but key decision points are retained. Suitable for: long-running DeFi Agents that only need to remember decision outcomes, not every intermediate step.
RAG Externalization (Retrieval-Augmented Generation): Move long-term information out of the Context Window into a vector database; retrieve only the 'K most relevant chunks' into Context before each inference. Context contains only information relevant to the current task, not full history. Most flexible approach, but introduces vector search latency and engineering complexity.
Task Decomposition: Split a long task requiring extensive Context into multiple independent short tasks, each with its own clean Context. An Orchestrator coordinates the inputs/outputs of sub-tasks, preventing any single task's Context from growing too long. This is one of the design advantages of Multi-Agent systems.
Practical recommendation for DeFi Agents: The most common approach is 'sliding window summarization + structured DB long-term memory' — short-term Context management via summarization; long-term decision records stored in PostgreSQL; not fully relying on Context Window to retain history.
How do Context Window sizes differ across Claude, GPT-4o, and Gemini? How should this factor into model selection?
Context Windows for mainstream models (as of mid-2026):
Claude Sonnet (Anthropic): 200K tokens ≈ ~150,000 English words or ~100,000 Chinese characters. Sufficient for most DeFi Agent tasks — can fit hours of operation history + current task + tool returns in one Context.
GPT-4o (OpenAI): 128K tokens. Smaller than Claude; long-running Agents hit the limit more easily, requiring more aggressive Context management strategies.
Gemini 1.5 Pro (Google): 1M tokens — the largest among mainstream models. Theoretically can fit an entire DeFi protocol codebase for analysis, but more tokens means higher API costs, and the 'needle in a haystack problem' (finding the most critical few lines in 1M tokens) remains an unsolved engineering challenge.
Context Window considerations for model selection:
Bigger is not always better. If your Agent task uses 20K-50K tokens per Context, upgrading from 200K to 1M makes almost no difference to Agent performance but may significantly increase API costs (typically token-based pricing, with larger context versions costing more per token).
Scenarios genuinely needing a very large Context: one-time analysis of large codebases or lengthy documents (not a continuously running Agent but a one-time analysis task); cross-document analysis requiring comparison of large volumes of documents in one Context.
For continuously running DeFi Agents: 200K tokens (Claude Sonnet) is usually sufficient; combined with sliding window summarization, it can extend indefinitely. Prioritize the model's reasoning capability and Tool Use support quality over Context Window size.
Is 'larger Context Window = smarter Agent' true? What is the relationship between Context Window size and Agent performance?
This is one of the most common misconceptions. The relationship between Context Window size and Agent performance is more complex than most people think:
What Context Window size actually affects: whether a complete set of task-relevant information can fit in one Context (long documents, long conversation history); the amount of information an Agent 'remembers' within a single conversation; the feasibility of one-time large-scale analysis tasks (analyzing a complete contract codebase).
What Context Window size does NOT affect: the model's reasoning depth (quality of reasoning on complex problems) — reasoning quality is primarily determined by the model's training, not its Context Window size. A 200K-token Claude Sonnet may reason better about complex multi-step DeFi strategy problems than a 1M-token weaker model, even though the latter can 'see' more information. The model's Tool Use accuracy (precision of function call outputs); the model's resistance to Prompt Injection (larger Contexts may actually increase attack surface — more external data can inject more malicious instructions).
The 'Lost in the Middle' problem: Research finds that when Context Windows contain large amounts of text, models tend to pay attention to information at the beginning and end of Context, neglecting information in the middle — even when critical information is there. Simply increasing Context Window size doesn't guarantee the model 'better utilizes' more information.
Practical recommendation: When selecting models, prioritize reasoning capability, Tool Use support quality, and pricing over Context Window size. Context Window being 'good enough' is sufficient (most Agents need 128K-200K); don't choose a weaker reasoning model just for a larger Context Window.
Concrete calculation: how many tokens does a DeFi Agent consume in one inference cycle?
Using an Agent executing USDC yield optimization on Base chain as an example, calculating token consumption for one complete inference cycle:
Input Context (what the Agent sees before reasoning):
Output (Agent reasoning + decision output):
Single inference cycle total: ~4,200 tokens
Converting to API cost (Claude Sonnet, mid-2026 pricing):
Converting to Context Window utilization: 4,200 / 200,000 = 2.1%. If the Agent runs 10 inferences per hour, 8 hours consumes only 16.8% — showing that with effective tool return trimming, 200K Context Window is more than sufficient for DeFi Agents, without needing frequent summarization compression.
Warning: Without tool return trimming, passing an entire Aave API response to the Agent (possibly 5,000-10,000 tokens), the same 8-hour session may consume 40-80% of the Context Window, with API costs 10-20× higher.
Larger Context Window → can process more information simultaneously, suitable for one-time large-scale analysis tasks; but models with larger Contexts typically cost more per token, and the 'Lost in the Middle' problem exists (model attention decreases for information in the middle of Context). Smaller Context Window → forces the Agent to more actively manage information (selecting only the most important information for Context), sometimes making Agent decisions more focused; lower cost. For continuously running Agents: 'sufficient' Context Window is ideal (128K-200K is usually enough); combined with sliding window summarization, this extends indefinitely — no need to upgrade to a weaker reasoning model just for a larger Context Window.