Don't Just Talk, Do: Giving Your LLM Tools to Interact with the Real World
Transform your LLM from a chatbot into an agent by teaching it to use tools and interact with real-world systems
Don’t Just Talk, Do: Giving Your LLM Tools to Interact with the Real World
Large Language Models are incredible conversationalists. They can write poetry, summarize text, and answer questions on a vast range of topics. But out of the box, they have a fundamental limitation: they live in a bubble. An LLM doesn’t know what the current weather is in Los Angeles, it can’t check the status of a user’s order in your database, and it can’t book a meeting on your calendar. It can only process and generate text based on the data it was trained on.
So how do we bridge this gap? How do we connect the LLM’s powerful reasoning engine to the real world of live data and actions?
The answer is a powerful technique supported by most modern models called Tool Use or Function Calling. Instead of asking the model to answer a question directly, we give it a list of “tools” it can use. The model’s job then becomes figuring out which tool to use, and with which arguments, to get the information needed to answer the user’s question.
This transforms the LLM from a simple chatbot into the reasoning core of an application that can interact with external systems. Let’s build a simple example to see how it works.
The Problem: A Weather-Blind Bot
Let’s say we want to build a simple chatbot that can answer questions about the weather. A user asks, “What’s the weather like in Boston?”
If we send this prompt directly to an LLM, it will likely give a generic answer or state that it doesn’t have access to real-time information. It can’t look up the live weather.
The Solution: Giving the LLM a Tool
We’re going to give our LLM a single tool: getCurrentWeather
. This example is adapted from OpenAI’s function calling documentation, which provides an excellent introduction to the concept.
First, let’s define this tool as a simple function in our application code. This function will be responsible for getting the actual data. For this example, we’ll just use a mock that returns a hardcoded value.
import json
def get_current_weather(location: str):
"""Mocks an API call that returns the current weather for a specified location."""
print(f"--- Calling external weather API for {location} ---")
if "boston" in location.lower():
return json.dumps({"location": "Boston", "temperature": "72", "condition": "Sunny"})
elif "tokyo" in location.lower():
return json.dumps({"location": "Tokyo", "temperature": "24", "condition": "Cloudy"})
else:
return json.dumps({"location": location, "temperature": "unknown", "condition": "unknown"})
Now, the crucial part. We need to describe this function to the LLM in a format it understands. This is done with a JSON schema that defines the tool’s name, description, and the parameters it accepts. Here’s the OpenAI format (other providers use similar structures):
tools = [
{
"type": "function",
"function": {
"name": "getCurrentWeather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. Los Angeles, CA",
},
},
"required": ["location"],
"additionalProperties": False
},
}
}
]
This schema is our contract with the LLM. We’re telling it: “You have a tool named getCurrentWeather
available. It’s for getting the weather, and it requires one string argument: location
.”
The Two-Step Conversation Flow
Now, instead of just sending the user’s prompt to the model, we send the prompt and the list of available tools. The flow becomes a two-step process.
Step 1: The LLM Decides Which Tool to Use
Our application code now does the following:
- Gets the user’s query: “What’s the weather like in Boston?”
- Sends a request to the LLM that includes the user’s query and the JSON schema for our
tools
. - The LLM analyzes the request. It sees that the user is asking about the weather and that it has a
getCurrentWeather
tool that can help. - Crucially, the LLM does not answer the question. Instead, its response is a structured piece of JSON telling us which tool to call.
Here’s what that looks like in code using OpenAI’s API (similar patterns apply to Anthropic Claude, Google Gemini, and other providers):
from openai import OpenAI
client = OpenAI()
# Send the user's prompt along with available tools
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "What's the weather like in Boston?"}],
tools=tools,
tool_choice="auto" # Let the model decide whether to use tools
)
# The LLM's response is NOT a weather report.
# It's an instruction to call a function.
message = response.choices[0].message
if message.tool_calls:
tool_call = message.tool_calls[0]
# print(tool_call)
# -> ToolCall(id='call_123', function=Function(name='getCurrentWeather', arguments='{"location": "Boston, MA"}'))
The model has correctly identified the tool to use (getCurrentWeather
) and extracted the necessary argument ("Boston, MA"
).
Step 2: We Execute the Tool and Get the Final Answer
Our application code now takes over. It parses the LLM’s response, sees the instruction to call getCurrentWeather
, and executes our actual Python function.
import json
# Extract the tool call details
tool_call = message.tool_calls[0]
function_name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)
# Execute our actual function
if function_name == "getCurrentWeather":
tool_response = get_current_weather(location=arguments['location'])
# Create the conversation history with the tool result
messages = [
{"role": "user", "content": "What's the weather like in Boston?"},
{"role": "assistant", "content": None, "tool_calls": [tool_call.model_dump()]},
{"role": "tool", "tool_call_id": tool_call.id, "content": tool_response}
]
# Send the tool result back to get the final response
final_response = client.chat.completions.create(
model="gpt-4",
messages=messages
)
print(final_response.choices[0].message.content)
# -> "The weather in Boston is currently 72 degrees and Sunny."
In this second step, we feed the result of our tool call back to the model. Now that the LLM has the real-world data (“72 and Sunny”), it can fulfill the user’s original request and generate a friendly, natural-language answer.
The Payoff: A Reasoning Engine for Your Application
This pattern is powerful because it creates a clean separation of concerns:
- The LLM is responsible for reasoning. Its job is to understand user intent and map it to the available tools.
- Your Application is responsible for execution. Your code maintains control over how API calls are made, how database queries are run, and how real-world actions are performed.
By giving your LLM tools, you’re not just building a chatbot. You’re using the LLM as a reasoning engine to drive a larger application that can interact with the world, turning simple text prompts into concrete actions and data-driven responses. It’s the first step toward building truly intelligent and useful AI agents.
When to Consider the Model Context Protocol (MCP)
The approach we’ve shown above works great for getting started, but as your AI applications grow in complexity, you might find yourself running into some limitations. What if you want to reuse your weather tool across multiple applications? What if different teams in your organization are building different tools that should work together? What if you want to leverage tools that others have already built?
This is where the Model Context Protocol (MCP) becomes valuable. Introduced by Anthropic in November 2024, MCP is an open standard that provides a unified way for AI applications to securely connect to data sources and tools. Instead of defining tools directly in your application code, MCP lets you create standalone “MCP servers” that expose capabilities over a standardized JSON-RPC protocol.
Think of MCP like USB-C for AI applications: just as USB-C provides a standardized way to connect devices to various peripherals, MCP provides a standardized way to connect AI models to different data sources and tools.
The MCP Architecture
MCP defines three core primitives that servers can expose. Tools are functions that can be called by the LLM, like our weather function. Resources represent structured data that can be included in prompts, such as files or database records. Prompts are reusable prompt templates with parameters that help standardize common interactions.
With MCP, our weather tool becomes a separate service that implements the MCP protocol:
# weather_mcp_server.py - A standalone MCP server with FastAPI
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import json
import uvicorn
# Create FastAPI app
app = FastAPI(title="Weather MCP Server")
# Request/Response models
class ToolRequest(BaseModel):
name: str
arguments: dict
class ToolResponse(BaseModel):
type: str = "text"
text: str
class Tool(BaseModel):
name: str
description: str
inputSchema: dict
# Available tools registry
TOOLS = {
"getCurrentWeather": Tool(
name="getCurrentWeather",
description="Get the current weather in a given location",
inputSchema={
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. Los Angeles, CA"
}
},
"required": ["location"]
}
)
}
@app.get("/tools", response_model=list[Tool])
async def list_tools():
"""List all available tools."""
return list(TOOLS.values())
@app.post("/tools/call", response_model=ToolResponse)
async def call_tool(request: ToolRequest):
"""Execute a tool with given arguments."""
if request.name not in TOOLS:
raise HTTPException(status_code=404, detail=f"Tool '{request.name}' not found")
if request.name == "getCurrentWeather":
weather_data = await get_current_weather(request.arguments["location"])
return ToolResponse(text=weather_data)
raise HTTPException(status_code=501, detail=f"Tool '{request.name}' not implemented")
async def get_current_weather(location: str) -> str:
"""Mock weather API call."""
if "boston" in location.lower():
return json.dumps({"location": "Boston", "temperature": "72", "condition": "Sunny"})
elif "tokyo" in location.lower():
return json.dumps({"location": "Tokyo", "temperature": "24", "condition": "Cloudy"})
else:
return json.dumps({"location": location, "temperature": "unknown", "condition": "unknown"})
@app.get("/health")
async def health_check():
"""Health check endpoint."""
return {"status": "healthy", "server": "weather-mcp-server"}
if __name__ == "__main__":
# Run with: python weather_mcp_server.py
# Or: uvicorn weather_mcp_server:app --reload
uvicorn.run(app, host="0.0.0.0", port=8000)
Your main application then connects to this MCP server and uses its tools through a standardized client interface, rather than defining them locally. This separation allows the same weather service to be used by multiple applications, teams, and even different LLM providers.
When MCP Makes Sense
Consider using MCP servers instead of direct tool integration when you’re building for reuse. If multiple applications need the same tools—whether that’s weather data, calendar access, or database queries—MCP lets you write the tool once and use it everywhere. This becomes especially valuable when different teams are building tools that need to work together, or when you want to share tools across different LLM providers or frameworks.
MCP also shines when you need better separation of concerns. Complex tool logic often deserves its own codebase and deployment cycle. In larger organizations, the database team might own database tools while the calendar team maintains scheduling capabilities. With MCP, each team can version and deploy their tools independently from the main application, reducing coordination overhead and deployment risks.
Scale is another important consideration. As your tool ecosystem grows, MCP provides better organization and management capabilities. You can implement different security or access controls for different tools, and leverage existing MCP servers from the community rather than building everything from scratch.
Finally, if you value standardization and interoperability, MCP offers significant advantages. In enterprise environments where different systems need to work together, having a standard protocol for tool interaction becomes crucial. It also helps future-proof your tool investments, as tools written to the MCP standard will work with new LLM applications and providers as they emerge.
The Growing MCP Ecosystem
Since its introduction, MCP has gained significant traction across the industry. Anthropic maintains pre-built MCP servers for popular enterprise systems including Google Drive for file access and management, Slack for messaging and channel operations, GitHub for repository and issue tracking, PostgreSQL for database operations, Puppeteer for web automation, and Brave Search for web search capabilities.
The adoption has been remarkably swift. Major AI companies and tool providers are actively integrating MCP into their products. Microsoft has shown particular interest, with teams across VS Code, Semantic Kernel, and other developer tools exploring MCP integration. The protocol’s open nature and Anthropic’s active development of reference implementations suggest it could become the de facto standard for LLM tool integration.
Security Considerations
MCP’s power comes with security responsibilities. Since MCP servers can access sensitive data and perform actions on behalf of users, proper security practices are essential. You’ll need to implement robust authentication for MCP server access, carefully validate all tool inputs to prevent injection attacks, and follow least-privilege principles when granting capabilities to MCP servers. Comprehensive logging and monitoring of MCP tool usage is also crucial for security auditing and compliance.
Recent security research has highlighted potential vulnerabilities in MCP implementations, making these security considerations crucial for production deployments. The distributed nature of MCP means you’re essentially creating a network of services that can perform actions on behalf of users, so the security implications should be carefully considered from the start.
The Trade-offs
MCP adds architectural complexity in exchange for flexibility and reusability. For a simple application with a few custom tools, the direct approach we showed earlier is often the right choice. But as your AI ecosystem grows, MCP’s benefits of modularity, reusability, and standardization become increasingly valuable.
Think of it like the difference between writing all your code in one file versus organizing it into modules and libraries. Both approaches work, but the modular approach scales better as your system grows.
The key is to start simple with direct tool integration, then migrate to MCP when you hit the limitations that MCP solves.