Ollama API: Run Large Language Models Locally with Simple APIs

Running Large Language Models (LLMs) locally is becoming increasingly important for developers who care about privacy, cost, latency, and offline access. Ollama makes this practical by providing a clean CLI and a simple HTTP API to run models like Llama, Mistral, Gemma, and more on your own machine.

In this post, we’ll explore what the Ollama API is, how it works, and how to use it in real applications.

Ollama is a local AI runtime that lets you run open-source large language models on your own machine. It provides a simple CLI and HTTP API to download, manage, and interact with models privately, offline, and without relying on cloud-based AI services. To learn Essential Ollama Commands read our article.

Installing Ollama

First of all download and install ollam from the official site.

Verify installation:

ollama --version

Run a model in this demo we will use qwen2.5:latest model.

ollama run qwen2.5:latest

Ollama API Basics

Ollama exposes REST APIs ,it can be used with other applications.Ollama exposes a local HTTP server by default at

http://localhost:11434

We can interact with standard REST API calls. If we send a GET request to http://localhost:11434 we will get following response.

Ollama provides following APIs:

Text generation API
Chat completion API
Embedding generation API
Version API

1. Generate Text with Ollama API

To generate text using ollama API use the below API

POST http://localhost:11434/api/generate

Example request body :

{
"model": "qwen2.5:latest",
"prompt": "Define REST API in 50 words"
}

As we can see that generate API will return stream of JSON objects. The response is streamed by default, making it suitable for chat UIs. To disable streaming use “stream : false” in the request body.

2. Generate a chat completion

For conversational use cases we can use the following Ollama API endpoint :

POST /api/chat

Example request body :

{
"model": "qwen2.5:latest",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "What is Docker?"}
],
"stream" : false
}

3. Generate Embedding

Ollama also supports embeddings for semantic search and RAG systems. To generate embedding use the following API endpoint

POST /api/embeddings

Example request body

{
"model": "all-minilm",
"prompt": "Nolowiz is awesome"
}

4. Version

This API endpoint returns the version of Ollama

GET /api/version

Conclusion

The Ollama API makes running LLMs locally simple, developer‑friendly, and practical. If you want control over your data, predictable costs, and low‑latency inference, Ollama is one of the best tools available today.