Skip to content

Lesson 3: tools - giving agents hands

In Lesson 1, we established that an agent is a brain (LLM) plus hands (tools) plus a control loop (orchestration). In Lesson 2, we explored the brain. Now we give the agent hands.

Without tools, an LLM can only generate text. It can reason about a question, but it cannot look up the answer in a database. It can plan a deployment, but it cannot actually run the deployment script. It can draft an email, but it cannot send it.

Tools are what bridge the gap between thinking and doing. They are the mechanism by which an agent interacts with the outside world - reading data, calling APIs, executing code, and taking actions.

This lesson covers what tools are, the different types of tools, how function calling works under the hood, best practices for designing tools, and common pitfalls that trip up even experienced engineers.


Consider this scenario. A user asks your agent: “How many open bugs are assigned to me?”

Without tools: The LLM has no access to your bug tracker. It might say “I do not have access to your bug tracker” (if you are lucky) or it might hallucinate a number (if you are not).

With tools: The agent thinks “I need to check the bug tracker.” It calls the get_bugs(assignee="current_user", status="open") tool. The tool queries your Jira/Linear/GitHub Issues instance and returns [{id: 1234, title: "Login timeout"}, {id: 1235, title: "CSS overflow on mobile"}]. The agent counts the results and responds: “You have 2 open bugs: Login timeout (#1234) and CSS overflow on mobile (#1235).”

Tools transform the agent from a smart conversation partner into a capable assistant that can actually get things done.

Without ToolsWith Tools
Answer from training data (possibly stale)Answer from live data
Describe what should be doneActually do it
Reason about codeRun code and see the output
Guess at current stateQuery and observe real state
Draft a messageSend the message

Tools come in several forms. Understanding the different types helps you choose the right approach for your agent.

These are functions you define that the agent can call. You control the implementation, the inputs, the outputs, and the error handling.

Examples:

  • search_database(query, filters) - Query your internal database
  • create_ticket(title, description, priority) - Create a support ticket
  • send_notification(user_id, message) - Send a push notification
  • get_weather(city) - Fetch weather from a third-party API
  • run_test_suite(test_path) - Execute tests and return results

Function tools are the most common type. They are the primary way you extend an agent’s capabilities to fit your specific use case.

These are tools provided by the platform or framework you are using. They come pre-built and ready to use.

Google Cloud / Gemini built-in tools:

ToolWhat It Does
Google SearchGrounds the model’s responses in real-time web search results. Reduces hallucination by providing current information.
Code ExecutionRuns Python code in a sandboxed environment. Useful for math, data analysis, and generating charts.
URL ContextFetches and processes content from a given URL.

When to use built-in tools:

  • When the functionality you need already exists as a built-in
  • When you want the platform to manage execution, sandboxing, and scaling
  • When you want to avoid maintaining your own implementation of common capabilities

When to build custom function tools instead:

  • When you need to access your own systems and data
  • When you need custom logic or business rules
  • When built-in tools do not cover your use case

This is a powerful pattern where one agent uses another agent as a tool. The calling agent delegates a subtask to a specialized agent, gets back the result, and continues with its own workflow.

Example:

Main Agent (Trip Planner)
|
+-- calls --> Flight Search Agent
| (specialized in finding flights)
|
+-- calls --> Hotel Agent
| (specialized in hotel bookings)
|
+-- calls --> Activity Agent
(specialized in local activities)

The main agent coordinates the overall trip planning. Each sub-agent is an expert in its domain with its own tools and instructions. The main agent does not need to know how flight search works - it just asks the Flight Search Agent and gets results back.

When to use agent tools:

  • When a subtask is complex enough to warrant its own agent with specialized tools and instructions
  • When you want to separate concerns and make each agent simpler
  • When different subtasks benefit from different models or configurations

Function calling is the core mechanism that enables tool use. Here is exactly how it works, step by step.

Step 1: You define tools and send them to the model with the user's message
Step 2: The model decides whether to call a tool (and which one)
Step 3: The model returns a structured tool call (function name + arguments)
Step 4: YOUR CODE executes the function (the model never runs it)
Step 5: You send the function result back to the model
Step 6: The model uses the result to generate a response (or call another tool)

This is critical to understand: the model does not execute tools. It proposes tool calls. Your code executes them. The model never touches your database, your APIs, or your systems directly. You are always in control.

Let us walk through a concrete example with a weather agent.

Step 1: Define the tool and send the request

tools = [
{
"name": "get_weather",
"description": "Get the current weather for a given city. Returns temperature, conditions, and humidity.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city name, e.g. 'London' or 'San Francisco'"
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature units"
}
},
"required": ["city"]
}
}
]
# Send to model with user message
response = model.generate(
messages=[{"role": "user", "content": "What is the weather in Tokyo?"}],
tools=tools
)

Step 2-3: The model returns a tool call

The model does not respond with text. Instead, it returns a structured tool call:

{
"tool_call": {
"name": "get_weather",
"arguments": {
"city": "Tokyo",
"units": "celsius"
}
}
}

The model figured out:

  • It needs the weather tool (not some other tool)
  • The city is “Tokyo” (extracted from the user’s message)
  • Celsius is probably appropriate for Tokyo (inferred from context)

Step 4: Your code executes the function

# This is YOUR code - you control what happens here
def get_weather(city, units="celsius"):
# Call a real weather API
response = weather_api.get(city=city, units=units)
return {
"city": city,
"temperature": response.temp,
"conditions": response.conditions,
"humidity": response.humidity
}
# Execute the tool call
result = get_weather(**tool_call.arguments)
# result = {"city": "Tokyo", "temperature": 18, "conditions": "Partly cloudy", "humidity": "65%"}

Step 5-6: Send the result back and get a response

response = model.generate(
messages=[
{"role": "user", "content": "What is the weather in Tokyo?"},
{"role": "assistant", "tool_call": tool_call},
{"role": "tool", "content": json.dumps(result)}
],
tools=tools
)
# Model responds: "The weather in Tokyo is 18 degrees Celsius and partly cloudy
# with 65% humidity."

Sometimes the model needs to call multiple tools to answer a question. This can happen in two ways:

Sequential (one after another):

User: "Compare the weather in Tokyo and London"
Step 1: Model calls get_weather(city="Tokyo")
Step 2: You execute, return result
Step 3: Model calls get_weather(city="London")
Step 4: You execute, return result
Step 5: Model synthesizes both results into a comparison

Parallel (at the same time): Some models support parallel tool calls, where the model proposes multiple calls in a single response. For the same weather comparison, the model could return both get_weather("Tokyo") and get_weather("London") at once. You execute both, send both results back, and the model synthesizes. This is faster because you save a round trip.

In a real agent, tool calling happens inside a loop: send the user message and tools to the model, check if the response is a text answer (done) or a tool call (execute it, append the result to the conversation, and loop back). This continues until the model decides it has enough information to respond.


How you design your tools has a direct impact on how well your agent performs. The model needs to understand your tools to use them correctly. Here are the principles that matter most.

The tool name is the first thing the model sees. It should immediately convey what the tool does.

Bad NameGood NameWhy
do_thingsearch_knowledge_baseSpecific about what it searches
api_callcreate_support_ticketDescribes the action and target
processvalidate_email_addressClear about what it processes and how
get_dataget_order_by_idSpecifies what data and how to find it
runexecute_sql_queryExplicit about what runs

Each tool should do one thing well. Just like functions in your code, tools should have a single, clear purpose.

Bad: One tool that does everything

{
"name": "manage_orders",
"description": "Create, read, update, or delete orders",
"parameters": {
"action": {"type": "string", "enum": ["create", "read", "update", "delete"]},
"order_id": {"type": "string"},
"order_data": {"type": "object"}
}
}

The model now has to figure out which action to use AND what parameters each action needs. This leads to errors.

Good: Separate tools for each action

[
{
"name": "get_order",
"description": "Look up an order by its ID. Returns order status, items, and shipping info.",
"parameters": {
"order_id": {"type": "string", "description": "The order ID, e.g. 'ORD-12345'"}
}
},
{
"name": "create_order",
"description": "Create a new order with the specified items.",
"parameters": {
"items": {"type": "array", "description": "List of item IDs and quantities"},
"shipping_address": {"type": "string"}
}
},
{
"name": "cancel_order",
"description": "Cancel an existing order. Only works for orders not yet shipped.",
"parameters": {
"order_id": {"type": "string"},
"reason": {"type": "string", "description": "Reason for cancellation"}
}
}
]

Each tool has a clear purpose with specific parameters. The model can easily pick the right one.

The parameter descriptions tell the model what to pass. Be specific about format, constraints, and defaults.

Bad:

{
"name": "search",
"parameters": {
"q": {"type": "string"},
"n": {"type": "integer"}
}
}

The model has to guess what q and n mean.

Good:

{
"name": "search_products",
"parameters": {
"query": {
"type": "string",
"description": "Search query for product name or description, e.g. 'wireless headphones'"
},
"max_results": {
"type": "integer",
"description": "Maximum number of results to return. Range: 1-50. Default: 10"
}
}
}

Tool results go back into the model’s context window. Returning too much data wastes tokens and can confuse the model.

Bad: Returning raw API response with everything

{
"status": 200,
"headers": {"content-type": "application/json", "x-request-id": "abc123", ...},
"data": {
"order": {
"id": "ORD-12345",
"internal_id": "a1b2c3d4-e5f6-7890",
"created_at": "2024-03-15T10:30:00Z",
"updated_at": "2024-03-15T14:22:00Z",
"customer_id": "CUST-789",
"customer_hash": "sha256:abcdef...",
"status": "shipped",
"status_code": 3,
"status_history": [...50 entries...],
"items": [...full product details with all fields...],
"shipping": {...full carrier details...},
"billing": {...full payment details...},
"metadata": {...internal tracking data...}
}
}
}

Good: Returning only what the agent needs

{
"order_id": "ORD-12345",
"status": "shipped",
"items": ["Wireless Headphones x1", "USB-C Cable x2"],
"tracking_number": "1Z999AA10123456784",
"estimated_delivery": "March 20, 2026"
}

When a tool call fails, the error message should help the model understand what went wrong and what to try next.

Bad error:

{"error": "Failed"}

Good error:

{
"error": "ORDER_NOT_FOUND",
"message": "No order found with ID 'ORD-99999'. Please verify the order ID and try again. Order IDs follow the format ORD-XXXXX."
}

The good error message gives the model enough information to either correct its approach (maybe it used the wrong ID format) or inform the user clearly.

6. document tool behavior in the description

Section titled “6. document tool behavior in the description”

The tool description is your chance to tell the model when and how to use the tool. Include:

  • What the tool does (primary action)
  • When to use it (conditions and triggers)
  • What it returns (output format)
  • Limitations (what it cannot do)

Example:

{
"name": "search_knowledge_base",
"description": "Search the company knowledge base for relevant articles and documentation. Use this tool when the user asks a question about company policies, product features, or internal processes. Returns the top 5 most relevant articles with titles, summaries, and links. Does not search external websites - use web_search for that."
}

As agent systems grow, tool integration becomes a significant challenge. Here is why.

Imagine you have N different AI applications (coding assistant, research agent, support bot) and M different services (Slack, Jira, GitHub, Salesforce). Without standardization, each application needs custom code to integrate with each service - that is N x M integrations to build and maintain.

Standards like the Model Context Protocol (MCP) aim to solve this by creating a common interface between AI applications and tool providers. Each application implements the protocol once. Each tool provider implements the protocol once. New applications automatically work with all existing tools, and vice versa.

Think of it like USB. Before USB, every device had its own proprietary cable. After USB, any device could connect to any computer. Tool standards aim to be the “USB for AI tools.”

When choosing a framework or building tools, consider whether other agents can reuse your tools (build as MCP servers for portability) and whether your agent can use tools from other sources (support MCP for broad compatibility). Standards are still evolving, so evaluate based on your timeline.


Every tool definition takes up space in the model’s context window. More importantly, the model has to reason about which tool to use from the full set. Too many tools leads to:

  • Decision paralysis: The model struggles to pick the right tool
  • Wrong tool selection: With many similar tools, the model picks the wrong one
  • Wasted context: Tool definitions crowd out useful conversation history

How many tools is too many?

There is no hard rule, but here are guidelines:

Number of ToolsGuidance
1-5Fine for most models. No special handling needed.
5-15Works well with clear, distinct names and descriptions.
15-30Consider grouping related tools or using a tool selection step.
30+Likely too many. Use a two-stage approach: first select a category, then choose a specific tool.

The two-stage approach for large tool sets:

Stage 1: Model picks a category
"User wants to manage their order -> Category: Order Management"
Stage 2: Model sees only the tools in that category
[get_order, create_order, cancel_order, update_shipping]
-> Model picks: get_order

This keeps each individual decision manageable.

If the model cannot tell when to use a tool, it will use it incorrectly or not at all.

Bad:

{
"name": "lookup",
"description": "Looks up information"
}

Information about what? When should the model call this instead of another tool? What format does the response come in?

Good:

{
"name": "lookup_employee",
"description": "Look up an employee by name or employee ID. Returns their department, role, email, and manager. Use this when the user asks about a specific person at the company."
}

3. thin API wrappers (leaking implementation details)

Section titled “3. thin API wrappers (leaking implementation details)”

A common mistake is wrapping a raw API endpoint as a tool without any abstraction. A generic api_request(method, url, headers, body) tool forces the model to know your URL structure, auth headers, and request format - it will frequently get these wrong.

Instead, build purpose-built tools like get_customer_orders(customer_email, status_filter) that hide HTTP details behind a clean interface. Your code handles authentication, URL construction, and response parsing. The model just says what it wants.

When a tool call fails and the error message is useless, the agent gets stuck in a retry loop or gives up entirely.

Common failure modes:

Error TypeBad ResponseGood Response
Not found{"error": true}{"error": "CUSTOMER_NOT_FOUND", "message": "No customer with email 'jn@example.com'. Did you mean 'jane@example.com'?"}
Invalid input500 Internal Server Error{"error": "INVALID_DATE_FORMAT", "message": "Expected date in YYYY-MM-DD format, got '03/15/2024'"}
Rate limited{"error": "fail"}{"error": "RATE_LIMITED", "message": "Too many requests. Try again in 30 seconds."}
Auth failurenull{"error": "UNAUTHORIZED", "message": "API key expired. This tool is temporarily unavailable."}

Large tool responses eat up the context window and can confuse the model with irrelevant details. Strategies to manage this:

  • Filter at the source: Only query what you need from the database/API
  • Select relevant fields: Do not return every field - pick what the agent needs
  • Paginate: Return a subset with the option to get more
  • Summarize: For large text responses, summarize before returning
  • Truncate: Cap long string fields (e.g., message[:200])

External APIs can be slow or unresponsive. Always set timeouts on HTTP calls and return descriptive error messages on failure rather than letting the agent hang indefinitely.


Use this checklist when designing tools for your agent:

[ ] Name clearly describes the action (verb + noun)
[ ] Description explains what, when, and what it returns
[ ] Each tool has a single responsibility
[ ] Parameters have descriptive names (not abbreviations)
[ ] Parameters include descriptions with format examples
[ ] Required vs optional parameters are clearly marked
[ ] Output is concise - only fields the agent needs
[ ] Error messages are descriptive and actionable
[ ] Timeouts are implemented for external calls
[ ] Authentication is handled in the tool code, not by the model
[ ] Output size is bounded (truncation, pagination, or summarization)
[ ] Tool is tested independently before connecting to the agent

Think of your smartphone. The phone itself is smart - it has a powerful processor, a nice screen, and a good operating system. But without apps, it cannot do very much.

  • Want to check the weather? You need the weather app.
  • Want to send a message? You need the messaging app.
  • Want to navigate somewhere? You need the maps app.
  • Want to take a photo? You need the camera app.

Each app gives the phone a new capability. The phone itself does not know how to forecast weather - it just knows how to open the weather app, ask for the forecast, and display the result.

An LLM is like the phone without apps. It is smart, but it cannot check real weather, send real messages, or look up real directions. It can only talk about these things based on what it learned during training.

Tools are like installing apps on the phone. Each tool gives the agent a new capability:

  • get_weather is the weather app
  • send_email is the email app
  • search_database is like a specialized search app for your data
  • run_code is like a coding app

And just like with apps:

  • The phone (model) decides which app to open based on what you ask
  • The app (tool) does the actual work
  • The phone displays the result to you

When you design tools, you are essentially building the app store for your agent. Good apps with clear names and useful features make the phone more capable. Bad apps with confusing names and buggy behavior make it frustrating.


Google Cloud provides several ways to give agents tools:

Vertex AI supports function calling with Gemini models. You define your tools as function declarations, and the model will generate structured function calls when appropriate.

Learn more: Vertex AI Function Calling

The Agent Development Kit (ADK) provides a structured way to define and manage tools for your agents. ADK supports:

  • Function tools: Wrap any Python function as an agent tool
  • Built-in tools: Google Search, Code Execution, and more
  • Agent tools: Use another ADK agent as a tool
  • Third-party tools: Integration with LangChain tools, CrewAI tools, and MCP servers

ADK handles the tool definition format, execution, and result passing so you can focus on the tool’s logic rather than the plumbing.

Learn more: ADK Tools Documentation

Google Search grounding is available as a built-in tool for Gemini models on Vertex AI. When enabled, the model can search the web to ground its responses in current information - no custom tool code required.


Putting it all together: a practical example

Section titled “Putting it all together: a practical example”

Let us design the tool set for a simple DevOps agent that helps engineers investigate production issues.

Help on-call engineers diagnose and respond to production alerts.

ToolPurposeWhen to Use
get_alert_detailsFetch details of a specific alertWhen an engineer asks about an alert
query_metricsGet time-series metrics for a serviceWhen investigating performance issues
search_logsSearch application logs by service and time rangeWhen looking for errors or patterns
list_recent_deploymentsShow recent deployments for a serviceWhen checking if a deploy caused the issue
get_runbookRetrieve the runbook for a specific alert typeWhen the agent needs to suggest remediation steps
create_incidentCreate an incident ticketWhen the issue is confirmed and needs tracking

Notice how each tool follows our best practices: clear verb+noun names, descriptions that explain what/when/returns, single responsibility, and descriptive parameters with examples.

Engineer: "Alert ALT-9876 just fired. What is going on?"
Agent thinking: "I need to look up this alert first."
-> Calls: get_alert_details(alert_id="ALT-9876")
<- Returns: {service: "auth-service", type: "high_latency", severity: "P1",
description: "p99 latency >2s for 5 minutes"}
Agent thinking: "High latency on auth-service. Let me check metrics and recent deploys."
-> Calls: query_metrics(service="auth-service", metric="latency_p99", time_range="1h")
-> Calls: list_recent_deployments(service="auth-service", limit=5)
<- Metrics show latency spike starting at 14:05
<- Deploy #456 happened at 14:02
Agent thinking: "Latency spiked 3 minutes after a deploy. Let me check error logs."
-> Calls: search_logs(service="auth-service", query="timeout OR slow", level="error", time_range="1h")
<- Logs show: "Database connection pool exhausted" errors starting at 14:05
Agent thinking: "The deploy likely changed something that exhausts the DB connection pool.
Let me get the runbook for guidance."
-> Calls: get_runbook(alert_type="high_latency")
<- Runbook: Step 1: Check recent deploys. Step 2: Compare config changes...
Agent responds:
"Here is what I found for ALT-9876:
**Issue:** auth-service p99 latency spiked above 2 seconds starting at 14:05.
**Likely cause:** Deploy #456 (at 14:02) appears to have triggered database connection pool
exhaustion. Error logs show 'Database connection pool exhausted' errors starting 3 minutes
after the deploy.
**Recommended actions:**
1. Review the changes in deploy #456, particularly any database configuration changes
2. Consider rolling back deploy #456 if the latency is impacting users
3. Check the connection pool size setting - it may need to be increased
Would you like me to create an incident ticket for this?"

This example shows how well-designed tools with clear names, good descriptions, and concise outputs enable the agent to efficiently investigate an issue through multiple steps.


  1. Tools bridge the gap between thinking and doing. Without tools, an agent is just a chatbot. With tools, it can interact with real systems and data.

  2. The model never executes tools directly. It proposes tool calls. Your code executes them. You are always in control of what actually happens.

  3. Three types of tools: Function tools (custom), built-in tools (platform-provided), and agent tools (using another agent as a tool). Start with function tools for your specific use case.

  4. Tool design is UX design - for the model. Clear names, descriptive parameters, concise outputs, and helpful error messages are the difference between a tool the model uses well and one it struggles with.

  5. Watch your tool count. More tools means more context used and harder decisions for the model. Keep your tool set focused on what the agent actually needs.

  6. Standardization reduces integration burden. Protocols like MCP aim to solve the N x M problem by creating a common interface between AI applications and tools.


Now that we understand the brain (LLM) and the hands (tools), the next lesson brings them together with the orchestration layer - the control loop that manages how an agent thinks, acts, observes, and repeats until the job is done.

Next: Lesson 4 - Orchestration: The Agent Loop —>