In Lesson 1, we established that an agent is a brain (LLM) plus hands (tools) plus a control loop (orchestration). In Lesson 2, we explored the brain. Now we give the agent hands.
Without tools, an LLM can only generate text. It can reason about a question, but it cannot look up the answer in a database. It can plan a deployment, but it cannot actually run the deployment script. It can draft an email, but it cannot send it.
Tools are what bridge the gap between thinking and doing. They are the mechanism by which an agent interacts with the outside world - reading data, calling APIs, executing code, and taking actions.
This lesson covers what tools are, the different types of tools, how function calling works under the hood, best practices for designing tools, and common pitfalls that trip up even experienced engineers.
Consider this scenario. A user asks your agent: “How many open bugs are assigned to me?”
Without tools: The LLM has no access to your bug tracker. It might say “I do not have access to your bug tracker” (if you are lucky) or it might hallucinate a number (if you are not).
With tools:
The agent thinks “I need to check the bug tracker.” It calls the get_bugs(assignee="current_user", status="open") tool. The tool queries your Jira/Linear/GitHub Issues instance and returns [{id: 1234, title: "Login timeout"}, {id: 1235, title: "CSS overflow on mobile"}]. The agent counts the results and responds: “You have 2 open bugs: Login timeout (#1234) and CSS overflow on mobile (#1235).”
Tools transform the agent from a smart conversation partner into a capable assistant that can actually get things done.
| Without Tools | With Tools |
|---|---|
| Answer from training data (possibly stale) | Answer from live data |
| Describe what should be done | Actually do it |
| Reason about code | Run code and see the output |
| Guess at current state | Query and observe real state |
| Draft a message | Send the message |
Tools come in several forms. Understanding the different types helps you choose the right approach for your agent.
These are functions you define that the agent can call. You control the implementation, the inputs, the outputs, and the error handling.
Examples:
search_database(query, filters) - Query your internal databasecreate_ticket(title, description, priority) - Create a support ticketsend_notification(user_id, message) - Send a push notificationget_weather(city) - Fetch weather from a third-party APIrun_test_suite(test_path) - Execute tests and return resultsFunction tools are the most common type. They are the primary way you extend an agent’s capabilities to fit your specific use case.
These are tools provided by the platform or framework you are using. They come pre-built and ready to use.
Google Cloud / Gemini built-in tools:
| Tool | What It Does |
|---|---|
| Google Search | Grounds the model’s responses in real-time web search results. Reduces hallucination by providing current information. |
| Code Execution | Runs Python code in a sandboxed environment. Useful for math, data analysis, and generating charts. |
| URL Context | Fetches and processes content from a given URL. |
When to use built-in tools:
When to build custom function tools instead:
This is a powerful pattern where one agent uses another agent as a tool. The calling agent delegates a subtask to a specialized agent, gets back the result, and continues with its own workflow.
Example:
Main Agent (Trip Planner)
|
+-- calls --> Flight Search Agent
| (specialized in finding flights)
|
+-- calls --> Hotel Agent
| (specialized in hotel bookings)
|
+-- calls --> Activity Agent
(specialized in local activities)
The main agent coordinates the overall trip planning. Each sub-agent is an expert in its domain with its own tools and instructions. The main agent does not need to know how flight search works - it just asks the Flight Search Agent and gets results back.
When to use agent tools:
Function calling is the core mechanism that enables tool use. Here is exactly how it works, step by step.
Step 1: You define tools and send them to the model with the user's message
Step 2: The model decides whether to call a tool (and which one)
Step 3: The model returns a structured tool call (function name + arguments)
Step 4: YOUR CODE executes the function (the model never runs it)
Step 5: You send the function result back to the model
Step 6: The model uses the result to generate a response (or call another tool)
This is critical to understand: the model does not execute tools. It proposes tool calls. Your code executes them. The model never touches your database, your APIs, or your systems directly. You are always in control.
Let us walk through a concrete example with a weather agent.
Step 1: Define the tool and send the request
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a given city. Returns temperature, conditions, and humidity.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city name, e.g. 'London' or 'San Francisco'"
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature units"
}
},
"required": ["city"]
}
}
]
# Send to model with user message
response = model.generate(
messages=[{"role": "user", "content": "What is the weather in Tokyo?"}],
tools=tools
)
Step 2-3: The model returns a tool call
The model does not respond with text. Instead, it returns a structured tool call:
{
"tool_call": {
"name": "get_weather",
"arguments": {
"city": "Tokyo",
"units": "celsius"
}
}
}
The model figured out:
Step 4: Your code executes the function
# This is YOUR code - you control what happens here
def get_weather(city, units="celsius"):
# Call a real weather API
response = weather_api.get(city=city, units=units)
return {
"city": city,
"temperature": response.temp,
"conditions": response.conditions,
"humidity": response.humidity
}
# Execute the tool call
result = get_weather(**tool_call.arguments)
# result = {"city": "Tokyo", "temperature": 18, "conditions": "Partly cloudy", "humidity": "65%"}
Step 5-6: Send the result back and get a response
response = model.generate(
messages=[
{"role": "user", "content": "What is the weather in Tokyo?"},
{"role": "assistant", "tool_call": tool_call},
{"role": "tool", "content": json.dumps(result)}
],
tools=tools
)
# Model responds: "The weather in Tokyo is 18 degrees Celsius and partly cloudy
# with 65% humidity."
Sometimes the model needs to call multiple tools to answer a question. This can happen in two ways:
Sequential (one after another):
User: "Compare the weather in Tokyo and London"
Step 1: Model calls get_weather(city="Tokyo")
Step 2: You execute, return result
Step 3: Model calls get_weather(city="London")
Step 4: You execute, return result
Step 5: Model synthesizes both results into a comparison
Parallel (at the same time):
Some models support parallel tool calls, where the model proposes multiple calls in a single response. For the same weather comparison, the model could return both get_weather("Tokyo") and get_weather("London") at once. You execute both, send both results back, and the model synthesizes. This is faster because you save a round trip.
In a real agent, tool calling happens inside a loop: send the user message and tools to the model, check if the response is a text answer (done) or a tool call (execute it, append the result to the conversation, and loop back). This continues until the model decides it has enough information to respond.
How you design your tools has a direct impact on how well your agent performs. The model needs to understand your tools to use them correctly. Here are the principles that matter most.
The tool name is the first thing the model sees. It should immediately convey what the tool does.
| Bad Name | Good Name | Why |
|---|---|---|
do_thing |
search_knowledge_base |
Specific about what it searches |
api_call |
create_support_ticket |
Describes the action and target |
process |
validate_email_address |
Clear about what it processes and how |
get_data |
get_order_by_id |
Specifies what data and how to find it |
run |
execute_sql_query |
Explicit about what runs |
Each tool should do one thing well. Just like functions in your code, tools should have a single, clear purpose.
Bad: One tool that does everything
{
"name": "manage_orders",
"description": "Create, read, update, or delete orders",
"parameters": {
"action": {"type": "string", "enum": ["create", "read", "update", "delete"]},
"order_id": {"type": "string"},
"order_data": {"type": "object"}
}
}
The model now has to figure out which action to use AND what parameters each action needs. This leads to errors.
Good: Separate tools for each action
[
{
"name": "get_order",
"description": "Look up an order by its ID. Returns order status, items, and shipping info.",
"parameters": {
"order_id": {"type": "string", "description": "The order ID, e.g. 'ORD-12345'"}
}
},
{
"name": "create_order",
"description": "Create a new order with the specified items.",
"parameters": {
"items": {"type": "array", "description": "List of item IDs and quantities"},
"shipping_address": {"type": "string"}
}
},
{
"name": "cancel_order",
"description": "Cancel an existing order. Only works for orders not yet shipped.",
"parameters": {
"order_id": {"type": "string"},
"reason": {"type": "string", "description": "Reason for cancellation"}
}
}
]
Each tool has a clear purpose with specific parameters. The model can easily pick the right one.
The parameter descriptions tell the model what to pass. Be specific about format, constraints, and defaults.
Bad:
{
"name": "search",
"parameters": {
"q": {"type": "string"},
"n": {"type": "integer"}
}
}
The model has to guess what q and n mean.
Good:
{
"name": "search_products",
"parameters": {
"query": {
"type": "string",
"description": "Search query for product name or description, e.g. 'wireless headphones'"
},
"max_results": {
"type": "integer",
"description": "Maximum number of results to return. Range: 1-50. Default: 10"
}
}
}
Tool results go back into the model’s context window. Returning too much data wastes tokens and can confuse the model.
Bad: Returning raw API response with everything
{
"status": 200,
"headers": {"content-type": "application/json", "x-request-id": "abc123", ...},
"data": {
"order": {
"id": "ORD-12345",
"internal_id": "a1b2c3d4-e5f6-7890",
"created_at": "2024-03-15T10:30:00Z",
"updated_at": "2024-03-15T14:22:00Z",
"customer_id": "CUST-789",
"customer_hash": "sha256:abcdef...",
"status": "shipped",
"status_code": 3,
"status_history": [...50 entries...],
"items": [...full product details with all fields...],
"shipping": {...full carrier details...},
"billing": {...full payment details...},
"metadata": {...internal tracking data...}
}
}
}
Good: Returning only what the agent needs
{
"order_id": "ORD-12345",
"status": "shipped",
"items": ["Wireless Headphones x1", "USB-C Cable x2"],
"tracking_number": "1Z999AA10123456784",
"estimated_delivery": "March 20, 2026"
}
When a tool call fails, the error message should help the model understand what went wrong and what to try next.
Bad error:
{"error": "Failed"}
Good error:
{
"error": "ORDER_NOT_FOUND",
"message": "No order found with ID 'ORD-99999'. Please verify the order ID and try again. Order IDs follow the format ORD-XXXXX."
}
The good error message gives the model enough information to either correct its approach (maybe it used the wrong ID format) or inform the user clearly.
The tool description is your chance to tell the model when and how to use the tool. Include:
Example:
{
"name": "search_knowledge_base",
"description": "Search the company knowledge base for relevant articles and documentation. Use this tool when the user asks a question about company policies, product features, or internal processes. Returns the top 5 most relevant articles with titles, summaries, and links. Does not search external websites - use web_search for that."
}
As agent systems grow, tool integration becomes a significant challenge. Here is why.
Imagine you have N different AI applications (coding assistant, research agent, support bot) and M different services (Slack, Jira, GitHub, Salesforce). Without standardization, each application needs custom code to integrate with each service - that is N x M integrations to build and maintain.
Standards like the Model Context Protocol (MCP) aim to solve this by creating a common interface between AI applications and tool providers. Each application implements the protocol once. Each tool provider implements the protocol once. New applications automatically work with all existing tools, and vice versa.
Think of it like USB. Before USB, every device had its own proprietary cable. After USB, any device could connect to any computer. Tool standards aim to be the “USB for AI tools.”
When choosing a framework or building tools, consider whether other agents can reuse your tools (build as MCP servers for portability) and whether your agent can use tools from other sources (support MCP for broad compatibility). Standards are still evolving, so evaluate based on your timeline.
Every tool definition takes up space in the model’s context window. More importantly, the model has to reason about which tool to use from the full set. Too many tools leads to:
How many tools is too many?
There is no hard rule, but here are guidelines:
| Number of Tools | Guidance |
|---|---|
| 1-5 | Fine for most models. No special handling needed. |
| 5-15 | Works well with clear, distinct names and descriptions. |
| 15-30 | Consider grouping related tools or using a tool selection step. |
| 30+ | Likely too many. Use a two-stage approach: first select a category, then choose a specific tool. |
The two-stage approach for large tool sets:
Stage 1: Model picks a category
"User wants to manage their order -> Category: Order Management"
Stage 2: Model sees only the tools in that category
[get_order, create_order, cancel_order, update_shipping]
-> Model picks: get_order
This keeps each individual decision manageable.
If the model cannot tell when to use a tool, it will use it incorrectly or not at all.
Bad:
{
"name": "lookup",
"description": "Looks up information"
}
Information about what? When should the model call this instead of another tool? What format does the response come in?
Good:
{
"name": "lookup_employee",
"description": "Look up an employee by name or employee ID. Returns their department, role, email, and manager. Use this when the user asks about a specific person at the company."
}
A common mistake is wrapping a raw API endpoint as a tool without any abstraction. A generic api_request(method, url, headers, body) tool forces the model to know your URL structure, auth headers, and request format - it will frequently get these wrong.
Instead, build purpose-built tools like get_customer_orders(customer_email, status_filter) that hide HTTP details behind a clean interface. Your code handles authentication, URL construction, and response parsing. The model just says what it wants.
When a tool call fails and the error message is useless, the agent gets stuck in a retry loop or gives up entirely.
Common failure modes:
| Error Type | Bad Response | Good Response |
|---|---|---|
| Not found | {"error": true} |
{"error": "CUSTOMER_NOT_FOUND", "message": "No customer with email 'jn@example.com'. Did you mean 'jane@example.com'?"} |
| Invalid input | 500 Internal Server Error |
{"error": "INVALID_DATE_FORMAT", "message": "Expected date in YYYY-MM-DD format, got '03/15/2024'"} |
| Rate limited | {"error": "fail"} |
{"error": "RATE_LIMITED", "message": "Too many requests. Try again in 30 seconds."} |
| Auth failure | null |
{"error": "UNAUTHORIZED", "message": "API key expired. This tool is temporarily unavailable."} |
Large tool responses eat up the context window and can confuse the model with irrelevant details. Strategies to manage this:
message[:200])External APIs can be slow or unresponsive. Always set timeouts on HTTP calls and return descriptive error messages on failure rather than letting the agent hang indefinitely.
Use this checklist when designing tools for your agent:
[ ] Name clearly describes the action (verb + noun)
[ ] Description explains what, when, and what it returns
[ ] Each tool has a single responsibility
[ ] Parameters have descriptive names (not abbreviations)
[ ] Parameters include descriptions with format examples
[ ] Required vs optional parameters are clearly marked
[ ] Output is concise - only fields the agent needs
[ ] Error messages are descriptive and actionable
[ ] Timeouts are implemented for external calls
[ ] Authentication is handled in the tool code, not by the model
[ ] Output size is bounded (truncation, pagination, or summarization)
[ ] Tool is tested independently before connecting to the agent
Think of your smartphone. The phone itself is smart - it has a powerful processor, a nice screen, and a good operating system. But without apps, it cannot do very much.
Each app gives the phone a new capability. The phone itself does not know how to forecast weather - it just knows how to open the weather app, ask for the forecast, and display the result.
An LLM is like the phone without apps. It is smart, but it cannot check real weather, send real messages, or look up real directions. It can only talk about these things based on what it learned during training.
Tools are like installing apps on the phone. Each tool gives the agent a new capability:
get_weather is the weather appsend_email is the email appsearch_database is like a specialized search app for your datarun_code is like a coding appAnd just like with apps:
When you design tools, you are essentially building the app store for your agent. Good apps with clear names and useful features make the phone more capable. Bad apps with confusing names and buggy behavior make it frustrating.
Google Cloud provides several ways to give agents tools:
Vertex AI supports function calling with Gemini models. You define your tools as function declarations, and the model will generate structured function calls when appropriate.
Learn more: Vertex AI Function Calling
The Agent Development Kit (ADK) provides a structured way to define and manage tools for your agents. ADK supports:
ADK handles the tool definition format, execution, and result passing so you can focus on the tool’s logic rather than the plumbing.
Learn more: ADK Tools Documentation
Google Search grounding is available as a built-in tool for Gemini models on Vertex AI. When enabled, the model can search the web to ground its responses in current information - no custom tool code required.
Let us design the tool set for a simple DevOps agent that helps engineers investigate production issues.
Help on-call engineers diagnose and respond to production alerts.
| Tool | Purpose | When to Use |
|---|---|---|
get_alert_details |
Fetch details of a specific alert | When an engineer asks about an alert |
query_metrics |
Get time-series metrics for a service | When investigating performance issues |
search_logs |
Search application logs by service and time range | When looking for errors or patterns |
list_recent_deployments |
Show recent deployments for a service | When checking if a deploy caused the issue |
get_runbook |
Retrieve the runbook for a specific alert type | When the agent needs to suggest remediation steps |
create_incident |
Create an incident ticket | When the issue is confirmed and needs tracking |
Notice how each tool follows our best practices: clear verb+noun names, descriptions that explain what/when/returns, single responsibility, and descriptive parameters with examples.
Engineer: "Alert ALT-9876 just fired. What is going on?"
Agent thinking: "I need to look up this alert first."
-> Calls: get_alert_details(alert_id="ALT-9876")
<- Returns: {service: "auth-service", type: "high_latency", severity: "P1",
description: "p99 latency >2s for 5 minutes"}
Agent thinking: "High latency on auth-service. Let me check metrics and recent deploys."
-> Calls: query_metrics(service="auth-service", metric="latency_p99", time_range="1h")
-> Calls: list_recent_deployments(service="auth-service", limit=5)
<- Metrics show latency spike starting at 14:05
<- Deploy #456 happened at 14:02
Agent thinking: "Latency spiked 3 minutes after a deploy. Let me check error logs."
-> Calls: search_logs(service="auth-service", query="timeout OR slow", level="error", time_range="1h")
<- Logs show: "Database connection pool exhausted" errors starting at 14:05
Agent thinking: "The deploy likely changed something that exhausts the DB connection pool.
Let me get the runbook for guidance."
-> Calls: get_runbook(alert_type="high_latency")
<- Runbook: Step 1: Check recent deploys. Step 2: Compare config changes...
Agent responds:
"Here is what I found for ALT-9876:
**Issue:** auth-service p99 latency spiked above 2 seconds starting at 14:05.
**Likely cause:** Deploy #456 (at 14:02) appears to have triggered database connection pool
exhaustion. Error logs show 'Database connection pool exhausted' errors starting 3 minutes
after the deploy.
**Recommended actions:**
1. Review the changes in deploy #456, particularly any database configuration changes
2. Consider rolling back deploy #456 if the latency is impacting users
3. Check the connection pool size setting - it may need to be increased
Would you like me to create an incident ticket for this?"
This example shows how well-designed tools with clear names, good descriptions, and concise outputs enable the agent to efficiently investigate an issue through multiple steps.
Tools bridge the gap between thinking and doing. Without tools, an agent is just a chatbot. With tools, it can interact with real systems and data.
The model never executes tools directly. It proposes tool calls. Your code executes them. You are always in control of what actually happens.
Three types of tools: Function tools (custom), built-in tools (platform-provided), and agent tools (using another agent as a tool). Start with function tools for your specific use case.
Tool design is UX design - for the model. Clear names, descriptive parameters, concise outputs, and helpful error messages are the difference between a tool the model uses well and one it struggles with.
Watch your tool count. More tools means more context used and harder decisions for the model. Keep your tool set focused on what the agent actually needs.
Standardization reduces integration burden. Protocols like MCP aim to solve the N x M problem by creating a common interface between AI applications and tools.
Now that we understand the brain (LLM) and the hands (tools), the next lesson brings them together with the orchestration layer - the control loop that manages how an agent thinks, acts, observes, and repeats until the job is done.