Question 1

Why does Context Window matter?

Accepted Answer

**What is a Token? How many tokens do Chinese, English, and code each 'consume'?**

A Token is the basic unit that LLMs use to process text — not exactly a 'character,' not exactly a 'word,' but the segments the model's tokenizer cuts text into. Token efficiency varies dramatically across languages, directly affecting your API costs and Context Window utilization.

**English has the highest token efficiency**: English averages ~4 characters per token; a common English word is typically 1 token (`the`, `agent`, `wallet`), longer words may be 2-3 tokens (`cryptocurrency`). **English: ~750 words ≈ 1,000 tokens.**

**Chinese has lower token efficiency**: Each Chinese character is typically 1-2 tokens (Claude's tokenizer compresses Chinese less efficiently than English). **Traditional Chinese: ~500-600 characters ≈ 1,000 tokens.** (The same information written in Chinese consumes roughly 1.3-1.5× the tokens of English.)

**Code consumes the most tokens**: Indentation, brackets, quotes, semicolons — each symbol occupies a token; long function names consume more tokens. 100 lines of Python may consume 500-800 tokens, far exceeding the same character count in English prose.

**SVG consumes a surprising amount**: SVG code (tags like `<rect x="30" y="40" width="100" fill="#333">`) is very unfriendly to tokenizers — each attribute value, quote, and coordinate is an independent token. A moderately complex SVG may consume 2,000-5,000 tokens. If Agent tool returns include SVG, Context Window fills very quickly.

**Practical implication**: In Agent tool return design, minimize returned token count — return only fields the Agent needs, not the entire API response.

Question 2

How does Context Window work?

Accepted Answer

**What happens when the Context Window fills up? What are the handling strategies?**

When the Context Window approaches its limit, models have different handling strategies — all imperfect — which is why 'managing the Context Window' is one of the core challenges of Agent engineering.

**Hard Truncation**: The simplest but worst approach — directly cut the earliest Context content, keeping only the most recent N tokens for the model. Consequence: the Agent 'forgets' early conversation and decision context, may repeat operations already performed, or forget important constraints set early on ('don't touch Aave today, it's upgrading').

**Sliding Window Summarization**: When Context usage exceeds 70%, automatically trigger summary compression — compress the earliest N rounds of conversation into a high-density summary using an LLM, then replace the original conversation with the summary. Loss: details are compressed, but key decision points are retained. Suitable for: long-running DeFi Agents that only need to remember decision outcomes, not every intermediate step.

**RAG Externalization (Retrieval-Augmented Generation)**: Move long-term information out of the Context Window into a vector database; retrieve only the 'K most relevant chunks' into Context before each inference. Context contains only information relevant to the current task, not full history. Most flexible approach, but introduces vector search latency and engineering complexity.

**Task Decomposition**: Split a long task requiring extensive Context into multiple independent short tasks, each with its own clean Context. An Orchestrator coordinates the inputs/outputs of sub-tasks, preventing any single task's Context from growing too long. This is one of the design advantages of Multi-Agent systems.

**Practical recommendation for DeFi Agents**: The most common approach is 'sliding window summarization + structured DB long-term memory' — short-term Context management via summarization; long-term decision records stored in PostgreSQL; not fully relying on Context Window to retain history.

Question 3

How is Context Window applied in practice?

Accepted Answer

**How do Context Window sizes differ across Claude, GPT-4o, and Gemini? How should this factor into model selection?**

Context Windows for mainstream models (as of mid-2026):

**Claude Sonnet (Anthropic)**: 200K tokens ≈ ~150,000 English words or ~100,000 Chinese characters. Sufficient for most DeFi Agent tasks — can fit hours of operation history + current task + tool returns in one Context.

**GPT-4o (OpenAI)**: 128K tokens. Smaller than Claude; long-running Agents hit the limit more easily, requiring more aggressive Context management strategies.

**Gemini 1.5 Pro (Google)**: 1M tokens — the largest among mainstream models. Theoretically can fit an entire DeFi protocol codebase for analysis, but more tokens means higher API costs, and the 'needle in a haystack problem' (finding the most critical few lines in 1M tokens) remains an unsolved engineering challenge.

**Context Window considerations for model selection:**

**Bigger is not always better.** If your Agent task uses 20K-50K tokens per Context, upgrading from 200K to 1M makes almost no difference to Agent performance but may significantly increase API costs (typically token-based pricing, with larger context versions costing more per token).

**Scenarios genuinely needing a very large Context**: one-time analysis of large codebases or lengthy documents (not a continuously running Agent but a one-time analysis task); cross-document analysis requiring comparison of large volumes of documents in one Context.

**For continuously running DeFi Agents**: 200K tokens (Claude Sonnet) is usually sufficient; combined with sliding window summarization, it can extend indefinitely. Prioritize the model's reasoning capability and Tool Use support quality over Context Window size.