Getting Started with Ollama

Get an LLM up and running on your local machine in 10 minutes.

Meta Ollama Llama3

Open Table of Contents

Quickstart

To get started, Download Ollama and run Llama3:

ollama run llama3

This will download the Llama3 model and run a new chat instance. If the model is already downloded, the terminal should immediately load a new chat.

Ollama supports a variety of popular LLM models, the full list can be found here. Here is an example of models that can be downloaded:

Model	Parameters	Size	Download
Llama 3	8B	4.7GB	ollama run llama3
Llama 3	70B	40GB	ollama run llama3:70b
Phi 3 Mini	3.8B	2.3GB	ollama run phi3
Phi 3 Medium	14B	7.9GB	ollama run phi3:medium
Gemma	2B	1.4GB	ollama run gemma:2b
Gemma	7B	4.8GB	ollama run gemma:7b
Mistral	7B	4.1GB	ollama run mistral
Moondream 2	1.4B	829MB	ollama run moondream
Neural Chat	7B	4.1GB	ollama run neural-chat
Starling	7B	4.1GB	ollama run starling-lm
Code Llama	7B	3.8GB	ollama run codellama
Llama 2 Uncensored	7B	3.8GB	ollama run llama2-uncensored
LLaVA	7B	4.5GB	ollama run llava
Solar	10.7B	6.1GB	ollama run solar

Ollama supports importing GGUF models in the Modelfile:

Create a file named Modelfile, with a FROM instruction with the local filepath to the model you want to import.

FROM ./vicuna-33b.Q4_0.gguf

ollama create example -f Modelfile

ollama run example

Ollama has a REST API for running and managing models.

Generate a response

curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt":"Why is the sky blue?"
}'

Chat with an LLM

curl http://localhost:11434/api/chat -d '{
  "model": "llama3",
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
}'

Want more endpoints? Check out the API documentation.