Glossary · Agent Architecture & Reasoning

Tool Use

Q: Why does Tool Use matter?

**What's the difference between Tool Use and Function Calling? Are they the same concept?** These two terms are sometimes used interchangeably in the industry, but strictly speaking there are subtle differences: **Function Calling** is the specific API feature OpenAI introduced in June 2023, defining a specific format for GPT models to call developer-defined functions (JSON Schema to define tools; model outputs a JSON-format function call request). Function Calling specifically refers to this OpenAI API mechanism — a vendor-specific implementation. **Tool Use** is the broader concept describing the general ability of AI models to call external tools/services, not tied to any specific vendor or API format. Anthropic calls it 'Tool Use,' Google calls it 'Function Calling,' but technically they do almost the same thing. **MCP (Model Context Protocol)** is a standardization layer on top of Tool Use, defining a unified communication format between AI and tools so the same tool can be used by different AI models without separate adaptation for each model. In practical development: if you're using the OpenAI SDK, you're calling the Function Calling API; if you're using the Anthropic SDK, you're calling the Tool Use API; if you're building a tool usable by multiple models, you should consider MCP format. The three terms have the same underlying technical essence — just different names, formats, and levels of standardization.

Q: How does Tool Use work?

**How do you design good tool Schemas? How much does the quality of tool descriptions affect Agent decision quality?** Tool Schema design directly determines whether the LLM can correctly select and call tools. A well-written tool description lets the LLM know precisely 'when to call this tool and what parameters to pass' without any additional instructions. **Five elements of a good tool Schema:** **Tool name (name)**: clear, verb-first, describes what the tool does. Good examples: `get_aave_usdc_apy`, `withdraw_from_morpho`. Bad examples: `tool_1`, `execute` (too vague). **Tool description (description)**: explains the tool's purpose, when to use it, and when NOT to use it (negative descriptions are equally important). Good example: 'Query the current supply APY of a specified token on the Aave protocol. Only for reading rate data; does not execute any transactions. Use when: you need to compare rates across different protocols. Do NOT use when: you need historical APY data (use get_historical_apy tool instead).' **Parameter definitions (parameters)**: each parameter has clear types, descriptions, and enum values (if applicable). Include an examples field so the LLM knows the expected parameter format. **Return value description**: explain in the description what format the tool returns, letting the LLM know how to parse the return value. **Tool boundary description**: clearly state 'what this tool can and cannot do,' preventing the LLM from making assumptions beyond the tool's actual capability (e.g., assuming a query tool can execute transactions). Tool description quality can affect tool call accuracy by as much as 20-30% — with the same LLM and same task, clear vs vague tool descriptions can make a very large difference in function call format error rates.

Agent Architecture & Reasoning Intermediate

Full Explanation +

01 · What is this?

What's the difference between Tool Use and Function Calling? Are they the same concept?

These two terms are sometimes used interchangeably in the industry, but strictly speaking there are subtle differences:

Function Calling is the specific API feature OpenAI introduced in June 2023, defining a specific format for GPT models to call developer-defined functions (JSON Schema to define tools; model outputs a JSON-format function call request). Function Calling specifically refers to this OpenAI API mechanism — a vendor-specific implementation.

Tool Use is the broader concept describing the general ability of AI models to call external tools/services, not tied to any specific vendor or API format. Anthropic calls it 'Tool Use,' Google calls it 'Function Calling,' but technically they do almost the same thing.

MCP (Model Context Protocol) is a standardization layer on top of Tool Use, defining a unified communication format between AI and tools so the same tool can be used by different AI models without separate adaptation for each model.

In practical development: if you're using the OpenAI SDK, you're calling the Function Calling API; if you're using the Anthropic SDK, you're calling the Tool Use API; if you're building a tool usable by multiple models, you should consider MCP format. The three terms have the same underlying technical essence — just different names, formats, and levels of standardization.

02 · Why does it exist?

How do you design good tool Schemas? How much does the quality of tool descriptions affect Agent decision quality?

Tool Schema design directly determines whether the LLM can correctly select and call tools. A well-written tool description lets the LLM know precisely 'when to call this tool and what parameters to pass' without any additional instructions.

Five elements of a good tool Schema:

Tool name (name): clear, verb-first, describes what the tool does. Good examples: get_aave_usdc_apy, withdraw_from_morpho. Bad examples: tool_1, execute (too vague).

Tool description (description): explains the tool's purpose, when to use it, and when NOT to use it (negative descriptions are equally important). Good example: 'Query the current supply APY of a specified token on the Aave protocol. Only for reading rate data; does not execute any transactions. Use when: you need to compare rates across different protocols. Do NOT use when: you need historical APY data (use get_historical_apy tool instead).'

Parameter definitions (parameters): each parameter has clear types, descriptions, and enum values (if applicable). Include an examples field so the LLM knows the expected parameter format.

Return value description: explain in the description what format the tool returns, letting the LLM know how to parse the return value.

Tool boundary description: clearly state 'what this tool can and cannot do,' preventing the LLM from making assumptions beyond the tool's actual capability (e.g., assuming a query tool can execute transactions).

Tool description quality can affect tool call accuracy by as much as 20-30% — with the same LLM and same task, clear vs vague tool descriptions can make a very large difference in function call format error rates.

03 · How does it affect your decisions?

How should Agent error handling logic be designed when tool calls fail?

Tool call failures are the most common interruption events in Onchain Agent daily operations; designing robust error handling logic is a necessary condition for production deployment.

Common tool call failure types and corresponding handling strategies:

Network timeout (HTTP timeout / RPC connection failure): the most common transient failure, usually recovering automatically within seconds to minutes. Handling strategy: exponential backoff retry (wait 1s first, 2s second, 4s third), maximum 3 retries. If 3 retries still fail, log the error and notify the Orchestrator 'sub-task failed, please decide next steps' — don't let the Agent keep retrying indefinitely.

API returns errors (4xx / 5xx): distinguish error types. 4xx (client errors) are usually parameter problems (404 - resource doesn't exist, 401 - authentication failed); retrying won't solve them; need the LLM to re-analyze the problem. 5xx (server errors) are temporary server issues; retrying is appropriate.

On-chain revert (transaction revert): transaction broadcast but reverts during on-chain execution; common causes: slippage exceeded, insufficient Gas, expired approval. Cannot simply retry (same parameters will likely revert again); need the LLM to analyze the revert reason and decide whether to retry with adjusted parameters or abandon the operation.

Backend validation intercept (BLOCKED): the Agent's attempted operation is blocked by backend security validation — this is usually not an 'error' but a 'correct security intercept.' The LLM should not try to circumvent this block; log an alert and notify human review.

In Agent tool function design, all errors should return structured error objects (containing error type, error description, and suggested next actions) rather than letting raw exceptions propagate into the LLM Context — raw exception information may contain sensitive system information and is often in a format unfriendly to LLMs.

04 · What should you do?

In Onchain Agents, how should read tools and write tools be designed with isolation?

Isolation between Read Tools and Write Tools is one of the core principles of Onchain Agent security design:

Read tool characteristics and design: doesn't change any external state; call failures can be retried any number of times; can be called at any time without additional confirmation. Design principles: function names start with get_, query_, fetch_; returns pure data objects; should never trigger transaction broadcasts; limit the size of returned data (preventing 15,000-token raw JSON from being stuffed into Context).

Write tool characteristics and design: changes on-chain state, consumes Gas, operations are irreversible. Design principles: function names start with execute_, send_, deposit_, withdraw_ (making the name explicitly tell the LLM this is an operation with side effects); include second-layer parameter validation inside the function (amount limits, address whitelist, operation type permissions); add a human-confirmation interrupt point for high-value operations; log all write tool calls to a security log (not just normal Debug logs).

Isolation implementation: at the Sub-agent design level, read Sub-agents are only given read tools, write Sub-agents are only given write tools, and the two Sub-agents cannot communicate directly (only through the Orchestrator). This means if Prompt Injection contaminates the read Sub-agent, the worst it can do is return incorrect data — it cannot directly trigger on-chain transactions.

Real-World Example +

Real example: tool design for a DeFi yield optimization Agent

Complete toolset for a DeFi yield optimization Agent managing $50,000 USDC:

Read Tools (3):

get_protocol_apy(protocol: str, token: str) -> {apy: float, tvl: float, updated_at: timestamp} — queries current APY and TVL for a specified protocol and token. Returns only minimum fields needed for decision-making (not entire API response)
get_gas_price() -> {base_fee_gwei: float, priority_fee_gwei: float, estimated_usd: float} — queries current Gas fees
get_wallet_balance(address: str) -> {usdc: float, eth: float} — queries operations wallet balance

Write Tools (2, only callable after Orchestrator approval):

withdraw_from_protocol(protocol: str, amount_usdc: float, approval_token: str) -> {tx_hash: str, status: str} — withdraws USDC from a protocol. Backend validation: amount_usdc ≤ 10,000 (per-transaction limit); protocol on whitelist; approval_token valid (confirmation from Orchestrator)
deposit_to_protocol(protocol: str, amount_usdc: float, approval_token: str) -> {tx_hash: str, status: str} — deposits USDC to a protocol. Same backend validation

Security of this tool design: even if the LLM is compromised by Prompt Injection, it cannot call write tools without a valid approval_token. The approval_token is a one-time token generated by the Orchestrator after human confirmation (or automatic validation passing). Without an approval_token, write tools return BLOCKED directly.

Diagram

Feel free to share. Please credit the source.

Common Misconceptions +

✕ Misconception 1

× Misconception 1: The more detailed the tool description, the better — more instructions means safer. Tool descriptions added to Context have a cost — each tool description may consume 500-2,000 tokens; with 20 tools, descriptions alone can consume 10,000-40,000 tokens. The correct approach is 'just enough clarity': let the LLM accurately judge when to call and what parameters to pass, without needing to include the entire operations manual. Descriptions should state 'what the tool does, when to use it, when not to use it' — anything beyond that is redundant.

✕ Misconception 2

× Misconception 2: When a tool call fails, just let the LLM decide how to retry on its own. LLM judgment on 'how to retry' is often unreliable, especially in on-chain operation scenarios — it may ignore the revert reason and repeatedly retry a doomed-to-fail operation with the same parameters. Correct approach: design structured error classification in the tool function's backend code ('transient error → auto-retryable' vs 'permanent error → escalate to Orchestrator'), with retry logic controlled at the code level rather than relying on LLM reasoning.

The Missing Link +

Direct Impact

More tools → Agent can do more and has more flexibility, but per-inference Context cost is higher (tool descriptions occupy more tokens), and the LLM's 'choice difficulty' when selecting tools becomes more serious (multiple similar tools make it harder for the LLM to judge which to use). Fewer tools → lower Context cost, clearer tool selection, but Agent capability is limited and complex tasks require more sequential tool calls (increasing latency and Gas fees). Best practice: keep a single Agent's tool set between 5-15 tools; if more tools are needed, group them via Sub-agents (each Sub-agent holds only the minimum tool set its task requires) rather than mounting dozens of tools on a single Agent.

← Previous Term

ReAct Framework

Ask a Question

Related Terms

Useful Resources

Onchain Data / TVL → Onchain Dashboards → Block Explorer → Prices / Market Data → MCP Servers → LLM Benchmarks → Model Comparison →