Skip to main content

Ollama

The ollama provider is compatible with Ollama, which enables access to Llama, Mixtral, Mistral, and more.

You can use its /api/generate endpoint by specifying any of the following providers from the Ollama library:

  • ollama:completion:llama3:text
  • ollama:completion:llama2:text
  • ollama:completion:llama2-uncensored
  • ollama:completion:codellama
  • ollama:completion:orca-mini
  • ...

Or, use the /api/chat endpoint for chat-formatted prompts:

  • ollama:chat:llama3
  • ollama:chat:llama3:8b
  • ollama:chat:llama3:70b
  • ollama:chat:llama2
  • ollama:chat:llama2:7b
  • ollama:chat:llama2:13b
  • ollama:chat:llama2:70b
  • ollama:chat:mixtral:8x7b
  • ollama:chat:mixtral:8x22b
  • ...

We also support the /api/embeddings endpoint via ollama:embeddings:<model name> for model-graded assertions such as similarity.

Supported environment variables:

  • OLLAMA_BASE_URL - protocol, host name, and port (defaults to http://localhost:11434)
  • OLLAMA_API_KEY - (optional) api key that is passed as the Bearer token in the Authorization Header when calling the API
  • REQUEST_TIMEOUT_MS - request timeout in milliseconds

To pass configuration options to Ollama, use the config key like so:

providers:
- id: ollama:llama2
config:
num_predict: 1024

localhost and IPv4 vs IPv6

If locally developing with localhost (promptfoo's default), and Ollama API calls are failing with ECONNREFUSED, then there may be an IPv4 vs IPv6 issue going on with localhost. Ollama's default host uses 127.0.0.1, which is an IPv4 address. The possible issue here arises from localhost being bound to an IPv6 address, as configured by the operating system's hosts file. To investigate and fix this issue, there's a few possible solutions:

  1. Change Ollama server to use IPv6 addressing by running export OLLAMA_HOST=":11434" before starting the Ollama server. Note this IPv6 support requires Ollama version 0.0.20 or newer.
  2. Change promptfoo to directly use an IPv4 address by configuring export OLLAMA_BASE_URL="http://127.0.0.1:11434".
  3. Update your OS's hosts file to bind localhost to IPv4.

Evaluating models serially

By default, promptfoo evaluates all providers concurrently for each prompt. However, you can run evaluations serially using the -j 1 option:

promptfoo eval -j 1

This sets concurrency to 1, which means:

  1. Evaluations happen one provider at a time, then one prompt at a time.
  2. Only one model is loaded into memory, conserving system resources.
  3. You can easily swap models between evaluations without conflicts.

This approach is particularly useful for:

  • Local setups with limited RAM
  • Testing multiple resource-intensive models
  • Debugging provider-specific issues