What are the four steps of Function Calling? Does the LLM itself actually execute tools?
Function Calling has a complete four-step flow — understanding these four steps is key to understanding how AI Agents work.
Step 1, tell the LLM what tools are available: when calling the LLM's API, you simultaneously pass in a tool list describing each tool's name, function, and parameter format (e.g., 'get_eth_price: get ETH's current price, parameter: currency (string, "USD" or "EUR")'). The LLM reads this list and knows what tools are available.
Step 2, LLM outputs a tool call request: when the LLM during reasoning determines it needs a tool, instead of executing directly, it outputs a structured JSON-format request in its response, e.g., {"tool": "get_eth_price", "parameters": {"currency": "USD"}}.
Step 3, backend code actually executes: your application (not the LLM) reads this JSON request, calls the actual get_eth_price function, and retrieves the real ETH price from the CoinGecko API.
Step 4, result returned to LLM: you pass the function's execution result (e.g., {"price": 3420.50, "currency": "USD"}) back to the LLM, which continues reasoning based on this result and generates its final response.
Critical understanding: the LLM itself never executes any tool. It only 'says' what it wants to call — the actual execution is always your backend code. This design lets you add any security validation logic at the execution layer.
Is Function Calling the same as MCP? I see both terms in the documentation — what's the difference?
They're not the same thing, but they're closely related and easy to confuse.
Function Calling / Tool Use is the 'communication mechanism' between the LLM and tools — it defines how the LLM 'says' it wants to call a tool (outputs structured JSON), and how tool results are returned to the LLM. This is an LLM API-level standard supported by both GPT and Claude.
MCP (Model Context Protocol) is a 'tool standardization packaging layer' built on top of Function Calling — it defines how tools should be described, discovered, and reused across different LLMs and frameworks. A simple analogy: Function Calling is like 'HTTP protocol,' defining communication methods; MCP is like 'REST API specification,' defining how to organize resources and operations on top of HTTP. With MCP, a tool only needs to be written once and can be used by all MCP-supporting systems (Claude, GPT, LangChain, ElizaOS); without MCP, every time you want different systems to use the same tool, you need separate integration for each.
Practical advice for beginners: if you just want your Agent to call a tool, using a framework (LangChain's @tool decorator, ElizaOS's Action plugin) is sufficient — no need to implement MCP yourself. MCP is truly needed when you want to 'publish tools for other Agents to use' or 'make tools usable cross-framework.'
How do I let the LLM know 'when it should call a tool and when it should answer directly'?
The answer lies in the tool's 'description text' — the LLM decides when to call based on the description you write for each tool. Several design principles for getting the LLM to make correct decisions.
First, tool descriptions should specify 'applicable scenarios': don't just describe what the tool does — be clear about when it should be used. Good description: 'Query ETH's current USD price. Call this tool when the user asks about a cryptocurrency's current price, latest quote, or needs to make price comparisons.' Bad description: 'Get ETH price.'
Second, let the LLM know when not to call tools: if your Agent has multiple tools, clarify usage boundaries in the System Prompt, e.g., 'only call tools when real-time data is needed; if the question is about conceptual explanation (e.g., "what is DeFi"), answer directly without calling tools.'
Third, don't have too many tools: the more tools passed to the LLM, the more tool descriptions it needs to read (consuming tokens), and the harder it is to select the correct tool from a large list. Recommend passing no more than 10–15 tools per LLM call; if you have more tools, use LangGraph to assign different tool subsets at different nodes rather than passing all at once.
Fourth, observe the LLM's actual choices: during development, enable verbose mode to watch what tools the LLM selects in what situations. If it frequently selects wrong tools, the issue is almost always insufficiently clear tool descriptions, not model capability.
What are the basic security design principles for Function Calling? Especially in crypto Agent contexts
Because the LLM itself doesn't execute tools (execution is in your backend), the core of security design is in the 'execution layer' of your backend. Several basic security principles for crypto Agents.
First, strictly classify read and write tools: divide tools into two categories — read-only tools (query data, read on-chain state; the consequence of errors is wrong information) and write tools (sign transactions, move funds; the consequence of errors is asset loss). The two categories have different authorization requirements at the backend execution layer and must not be mixed.
Second, write tools must have parameter validation: before executing write tools in the backend, validate that the parameters the LLM passed are reasonable — is the amount within the configured limit? Is the address on the whitelist? Is the operation type permitted? This validation logic is written in your backend code; the LLM cannot bypass it.
Third, large-amount operations require human confirmation: for write operations above a threshold (e.g., $100), add a human confirmation step in the backend (notify you, wait for your confirmation before executing). This layer is completely independent of the LLM's reasoning — even if the LLM is Prompt Injected, the attacker cannot bypass this human confirmation layer.
Fourth, log all tool calls: complete records of every tool call (tool name, input parameters, output results, execution time) written to logs. Without logs there's no post-hoc audit capability — if the Agent does something you don't understand, logs are your only path to finding the root cause.
A Minimal Working Example of Function Calling: Making an Agent Query ETH Price
Here's the simplest LangChain (Python) Function Calling implementation demonstrating the complete flow from tool definition to Agent execution.
First, define the tool: mark a Python function with the @tool decorator — the function's docstring is the tool's description, which the LLM reads to decide when to call it. The function itself calls the CoinGecko API to get the real ETH current price and returns a formatted result string.
Next, initialize the Agent: pass the tool list and LLM (e.g., Claude Sonnet) to LangChain's Agent, set a maximum iteration count (to prevent infinite loops), and add a System Prompt defining the Agent's role.
Finally, execute a task: when the user inputs 'What's the current ETH price?', the Agent's Thought step determines 'need to call get_eth_price tool' → outputs tool call JSON → backend executes the Python function → result returns to LLM → LLM generates natural language response: 'The current ETH price is $3,420.50.'
This example demonstrates the complete Function Calling loop: the LLM is only responsible for 'deciding what to call,' the backend is responsible for 'actually executing,' and the result returns to the LLM to generate the final response. Total token consumption: tool description (~50 tokens) + user question (~10) + LLM Thought (~30) + tool result (~20) + final response (~30) = ~140 tokens, cost ~$0.0004.
Function Calling's core tradeoff is 'flexibility vs. predictability.' The LLM autonomously deciding when to call tools lets Agents handle various unexpected situations, but also means the LLM may sometimes call tools when it shouldn't (e.g., conceptual explanation questions don't need tools, but the LLM still tries to call), or answer from memory when it should use tools (using potentially outdated numbers from training data). Another tradeoff is 'tool count vs. selection accuracy': providing more tools makes the Agent more capable, but the LLM's accuracy in selecting the right tool from more options may decrease, while also consuming more tokens. Recommendation for crypto Agents: only pass tools needed for the current task — don't pass all possibly-needed tools at once. LangGraph's DAG design allows assigning different tool subsets at different nodes, which is the best practice for handling large numbers of tools.