Run Local AI Agents Like a Pro – Part 2: Running LLMs Locally with Ollama

Ram Sathyavageeswaran

Cover Image for Run Local AI Agents Like a Pro – Part 2: Running LLMs Locally with Ollama

Ram Sathyavageeswaran

June 24, 2025

In Part 1: Setting Up the Environment, we got our hands dirty setting up WSL2, Ubuntu 24.04, Python, and Docker — laying the groundwork for our local AI dev playground.

Now it’s time for something more exciting: actually running large language models (LLMs) on your machine.

This post is all about Ollama, a slick tool that makes running powerful models like LLaMA 3, Mistral, and Phi-3 locally feel almost effortless.

🚀 So... Why Ollama?

Ollama is like the Docker of local LLMs. It takes care of downloading, installing, running, and serving models — so you can focus on what matters: using them.

Here’s what makes it special:

✅ What You Get:

No more API keys or rate limits
Models run completely offline
Clean command-line interface + HTTP API
Works with both CPU and GPU
Great for prototyping and plugging into agents

🛠️ How to Install Ollama (Ubuntu on WSL2)

Open up your WSL terminal and paste this in:

curl -fsSL https://ollama.com/install.sh | sh

Once it’s installed, check it’s working:

ollama --version

Then start the Ollama server:

ollama serve &

It’ll spin up a local inference server on http://localhost:11434. That’s your new best friend.

🧪 Run Your First Model with Tool Support

Let's start with a model that supports function calling - a key feature for building interactive agents. Here are some great options:

# Llama 3.3 with tool support
ollama run llama3.3

# Qwen 1.5 with tool support
ollama run qwen3:8b

# DeepSeek R1 with tool support
ollama run deepseek-r1

Want to see what’s already available?

```bash
ollama list

You can also use the Ollama Model Support Tools to search for models that support function calling.

Example model call:

>  ollama run qwen3:8b
>>> Hi Welcome to the world, explain yourself
Thinking...
Okay, the user greeted me with "Hi Welcome to the world, explain yourself." They want me to explain
myself. Let me start by acknowledging their greeting and then introduce myself in a friendly way. I
should mention that I'm a large language model developed by Alibaba Cloud, designed to assist with
various tasks. It's important to highlight my capabilities, like answering questions, creating
content, and providing information. I should keep the tone warm and approachable, maybe add an emoji
to make it more personable. Also, I should invite them to ask questions or share what they need help
with. Let me make sure the response is concise but covers all the key points. Alright, that should
work.
...done thinking.

Hi there! 🌟 Welcome to the world of AI! I'm Qwen, a large language model developed by Alibaba Cloud.
I'm here to help you with all sorts of tasks—answering questions, creating stories, solving problems,
or just chatting! Whether you need information, creativity, or support, I'm ready to assist. What's on
your mind? 😊

🌐 Use It with an API (Yes, Locally)

Once the Ollama server is up, you can hit it like any other API:

curl http://localhost:11434/api/generate \
  -d '{"model": "qwen3", "prompt": "Explain LangChain in simple terms."}'

This makes it incredibly easy to plug into your own apps, tools, or workflows — or into agent frameworks like Goose (which we’ll get to in the next post).

You can even define custom models using a simple Modelfile. Learn more about that in the Ollama library.

🧠 Where This Fits In

With Ollama running, you’ve got your own personal LLM runtime:

Everything is private and local
No internet needed to chat with models
Ready to power AI agents and developer tools

It's a major step toward making your machine feel like it has its own intelligence layer.

Coming Up Next…

In Part 3, we’ll hook up Ollama to Goose — a nimble AI agent framework that lets your models:

Run real shell commands
Summarize documents
Chain together tasks and tools

This is where things start to feel really powerful.

See you there!