From Local LLM to Tool-Using Agent
Learn how to transform a local large language model into a powerful agent by integrating external tools like web search, APIs, and code executors, enabling autonomous task completion beyond simple text generation.
Tags
Quick summary
Learn how to transform a local large language model into a powerful agent by integrating external tools like web search, APIs, and code executors, enabling autonomous task completion beyond simple text generation.
From Local LLM to Tool-Using Agent
The journey from running a large language model (LLM) on your own hardware to building a tool-using agent is one of the most transformative shifts in applied AI today. While local LLMs offer privacy, control, and offline capabilities, they lack the ability to interact with external systems—databases, APIs, file systems, or web services. By adding tool-use capabilities, you turn a static model into an autonomous agent that can retrieve information, perform calculations, and execute actions. This article provides a practical, step-by-step guide to bridging that gap.
Why Move from a Local LLM to an Agent?
Local LLMs, such as those powered by Llama, Mistral, or Phi, are excellent for text generation, summarization, and question answering. However, they are limited by their training data cutoff and their inability to access real-time information or external tools. A tool-using agent extends the LLM with a set of functions it can call—like a search engine, a calculator, or a database query tool—and decides when to use them based on the user's request.
This approach is central to modern AI development. As noted in industry updates from OpenAI and Microsoft, the trend is toward models that can plan and execute multi-step tasks using external resources. For local deployments, this means you can build an agent that runs entirely on your machine, respects your data privacy, and still performs complex, data-driven tasks.
Requirements
Before you begin, ensure your system meets these requirements:
- **Hardware**: A computer with at least 8GB of RAM (16GB recommended). A GPU with 6GB+ VRAM (e.g., NVIDIA RTX 3060 or higher) significantly speeds up local LLM inference.
- **Operating System**: Linux (Ubuntu 22.04+), macOS (Ventura+), or Windows (with WSL2 for best compatibility).
- **Software**:
- Python 3.10 or later
- pip (Python package manager)
- Git
- A local LLM runtime (e.g., Ollama, llama.cpp, or LM Studio)
- Basic familiarity with the command line
Step-by-Step Installation
We will use **Ollama** for the local LLM and a Python framework called **LangChain** to build the agent. LangChain provides a simple interface for defining tools and connecting them to the LLM.
1. Install Ollama and Download a Local LLM
Ollama is a user-friendly tool for running local LLMs. First, install it:
# On Linux or macOS (using the official script)
curl -fsSL https://ollama.com/install.sh | sh
# On Windows, download the installer from ollama.com and run itAfter installation, start the Ollama service:
ollama serveNow download a capable model. For tool-use, we recommend **Mistral 7B** or **Llama 3.1 8B**:
# Download Mistral 7B (about 4GB)
ollama pull mistral
# Alternatively, download Llama 3.1 8B
ollama pull llama3.1Verify the model works:
ollama run mistral "Hello, what is your name?"You should see a response from the model.
2. Set Up the Python Environment
Create a new directory for your agent project and set up a virtual environment:
mkdir local-agent
cd local-agent
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activateInstall the required Python packages:
pip install langchain langchain-community langchain-ollama requests- `langchain` — the core framework for building agents
- `langchain-community` — community-maintained integrations
- `langchain-ollama` — direct integration with Ollama
- `requests` — for making HTTP calls (used by some tools)
3. Define the Tools
Tools are functions the agent can call. We will create three simple tools: a calculator, a web search simulator (using a static dataset), and a file reader.
Create a file called `tools.py`:
# tools.py
import requests
import json
def calculator(expression: str) -> str:
"""
Evaluate a mathematical expression.
Example input: "2 + 3 * 4"
"""
try:
result = eval(expression)
return str(result)
except Exception as e:
return f"Error: {e}"
def search_web(query: str) -> str:
"""
Simulate a web search. In reality, you would use a search API.
Here we return a static result for demonstration.
"""
# Replace this with a real API call (e.g., DuckDuckGo, SerpAPI)
fake_results = {
"population of Tokyo": "Tokyo has a population of approximately 14 million (2024 estimate).",
"weather in London": "Current weather in London: 15°C, partly cloudy.",
"Python version": "Python 3.12 is the latest stable version as of 2025."
}
return fake_results.get(query.lower(), f"No results found for '{query}'.")
def read_file(file_path: str) -> str:
"""
Read the contents of a text file.
"""
try:
with open(file_path, 'r') as f:
return f.read()
except Exception as e:
return f"Error reading file: {e}"4. Build the Agent
Now, create the main agent script `agent.py`:
# agent.py
from langchain_ollama import ChatOllama
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType
from tools import calculator, search_web, read_file
# Initialize the local LLM
llm = ChatOllama(
model="mistral", # or "llama3.1"
temperature=0.7,
num_predict=2048
)
# Define the list of tools
tools = [
Tool(
name="Calculator",
func=calculator,
description="Useful for performing mathematical calculations. Input should be a mathematical expression."
),
Tool(
name="WebSearch",
func=search_web,
description="Searches the web for current information. Input should be a search query string."
),
Tool(
name="FileReader",
func=read_file,
description="Reads the contents of a file. Input should be a valid file path."
)
]
# Initialize the agent
agent = initialize_agent(
tools,
llm,
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
verbose=True,
handle_parsing_errors=True
)
# Run a simple query
if __name__ == "__main__":
query = "What is the population of Tokyo? Also, calculate 15 * 23."
response = agent.run(query)
print(response)5. Run the Agent
Execute the agent:
python agent.pyYou should see the agent's reasoning process (in verbose mode) and a final answer combining the web search result and the calculation.
Usage Examples
Example 1: Multi-Tool Query
Run the agent with a query that requires both calculation and file reading:
# In agent.py, change the query
query = "Read the file 'notes.txt' and then calculate how many characters are in it."
response = agent.run(query)This will cause the agent to call `FileReader` first, then `Calculator` on the length of the returned text.
Example 2: Custom Tool — Fetch Current Time
Add a new tool to `tools.py`:
import datetime
def current_time(format_string: str = "%Y-%m-%d %H:%M:%S") -> str:
"""Return the current date and time. Optionally specify a format."""
return datetime.datetime.now().strftime(format_string)Then register it in `agent.py`:
from tools import current_time
tools.append(
Tool(
name="CurrentTime",
func=current_time,
description="Returns the current date and time. Input format is optional."
)
)Now you can ask: "What time is it right now?" and the agent will call the tool.
Example 3: Using with a Different Local LLM
To switch to Llama 3.1, simply change the model name in `agent.py`:
llm = ChatOllama(
model="llama3.1",
temperature=0.7
)Remember to pull the model first with `ollama pull llama3.1`.
How the Agent Works Under the Hood
The agent uses the **ReAct** (Reasoning + Acting) pattern. When you send a query, the LLM:
1. **Thinks** about what tool to use and why. 2. **Acts** by calling the tool with the appropriate input. 3. **Observes** the output from the tool. 4. **Repeats** until it has enough information to answer.
This is visible in the verbose output. For example, you might see:
> Entering new AgentExecutor chain...
Thought: I need to find the population of Tokyo and calculate 15*23. I can use WebSearch for the first and Calculator for the second.
Action: WebSearch
Action Input: population of Tokyo
Observation: Tokyo has a population of approximately 14 million (2024 estimate).
Thought: Now I need to calculate 15*23.
Action: Calculator
Action Input: 15*23
Observation: 345
Thought: I have both pieces of information.
Final Answer: The population of Tokyo is approximately 14 million, and 15 * 23 equals 345.Scaling Up: From Local to Production
While this example uses a local LLM and simple tools, the same pattern scales to production environments. Companies like Microsoft and Anthropic have demonstrated agents that integrate with databases, cloud APIs, and enterprise systems. The key differences are:
- **Model size**: Larger models (70B+ parameters) often perform better at tool selection.
- **Tool complexity**: Real-world tools may require authentication, rate limiting, and error handling.
- **Memory**: Production agents use vector databases to store conversation history and retrieved context.
For local deployments, you can extend this setup with tools for:
- SQL database queries
- API calls to services like weather or stock prices
- Image generation using local models (e.g., Stable Diffusion)
- File management (create, edit, delete)
Conclusion
Transforming a local LLM into a tool-using agent unlocks a new level of capability. With just a few lines of Python and a local model, you can build an assistant that not only generates text but also calculates, searches, and interacts with your files. This approach respects your privacy, works offline, and gives you full control over the tools your agent can use.
The example provided here is a starting point. As you add more tools and refine the agent's prompts, you'll see it tackle increasingly complex tasks. The future of local AI is not just about running models—it's about building agents that act on your behalf, securely and autonomously.
Sources
FAQ
What is this article about?
This article covers “From Local LLM to Tool-Using Agent” in the AI agents category. Learn how to transform a local large language model into a powerful agent by integrating external tools like web search, APIs, and code executors, enabling autonomous task completion beyond simple text generation.
Who is this useful for?
It is useful for readers who want a practical understanding of AI tools, models, and workflows.
What should I do next?
Read the article, review the listed sources, and test the most relevant ideas in your own workflow.



