Together AI

Together AI provides access to open-source models through an API compatible with OpenAI's interface.

OpenAI Compatibility

Together AI's API is compatible with OpenAI's API, which means all parameters available in the OpenAI provider work with Together AI.

Basic Configuration

Configure a Together AI model in your promptfoo configuration:

promptfooconfig.yaml
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
  - id: togetherai:meta-llama/Llama-3.3-70B-Instruct-Turbo
    config:
      temperature: 0.7

The provider requires an API key stored in the TOGETHER_API_KEY environment variable.

Key Features

Max Tokens Configuration

config:
  max_tokens: 4096

Function Calling

config:
  tools:
    - type: function
      function:
        name: get_weather
        description: Get the current weather
        parameters:
          type: object
          properties:
            location:
              type: string
              description: City and state

JSON Mode

config:
  response_format: { type: 'json_object' }

Popular Models

Together AI offers over 200 models. Here are some of the most popular models by category:

Llama 4 Models

Llama 4 Maverick: meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 (524,288 context length, FP8)
Llama 4 Scout: meta-llama/Llama-4-Scout-17B-16E-Instruct (327,680 context length, FP16)

DeepSeek Models

DeepSeek R1: deepseek-ai/DeepSeek-R1 (128,000 context length, FP8)
DeepSeek R1 Distill Llama 70B: deepseek-ai/DeepSeek-R1-Distill-Llama-70B (131,072 context length, FP16)
DeepSeek R1 Distill Qwen 14B: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B (131,072 context length, FP16)
DeepSeek V3: deepseek-ai/DeepSeek-V3 (16,384 context length, FP8)

Llama 3 Models

Llama 3.3 70B Instruct Turbo: meta-llama/Llama-3.3-70B-Instruct-Turbo (131,072 context length, FP8)
Llama 3.1 70B Instruct Turbo: meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo (131,072 context length, FP8)
Llama 3.1 405B Instruct Turbo: meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo (130,815 context length, FP8)
Llama 3.1 8B Instruct Turbo: meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo (131,072 context length, FP8)
Llama 3.2 3B Instruct Turbo: meta-llama/Llama-3.2-3B-Instruct-Turbo (131,072 context length, FP16)

Mixtral Models

Mixtral-8x7B Instruct: mistralai/Mixtral-8x7B-Instruct-v0.1 (32,768 context length, FP16)
Mixtral-8x22B Instruct: mistralai/Mixtral-8x22B-Instruct-v0.1 (65,536 context length, FP16)
Mistral Small 3 Instruct (24B): mistralai/Mistral-Small-24B-Instruct-2501 (32,768 context length, FP16)

Qwen Models

Qwen 2.5 72B Instruct Turbo: Qwen/Qwen2.5-72B-Instruct-Turbo (32,768 context length, FP8)
Qwen 2.5 7B Instruct Turbo: Qwen/Qwen2.5-7B-Instruct-Turbo (32,768 context length, FP8)
Qwen 2.5 Coder 32B Instruct: Qwen/Qwen2.5-Coder-32B-Instruct (32,768 context length, FP16)
QwQ-32B: Qwen/QwQ-32B (32,768 context length, FP16)

Vision Models

Llama 3.2 Vision: meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo (131,072 context length, FP16)
Qwen 2.5 Vision Language 72B: Qwen/Qwen2.5-VL-72B-Instruct (32,768 context length, FP8)
Qwen 2 VL 72B: Qwen/Qwen2-VL-72B-Instruct (32,768 context length, FP16)

Free Endpoints

Together AI offers free tiers with reduced rate limits:

meta-llama/Llama-3.3-70B-Instruct-Turbo-Free
meta-llama/Llama-Vision-Free
deepseek-ai/DeepSeek-R1-Distill-Llama-70B-Free

For a complete list of all 200+ available models and their specifications, refer to the Together AI Models page.

Example Configuration

promptfooconfig.yaml
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.jsons
providers:
  - id: togetherai:meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
    config:
      temperature: 0.7
      max_tokens: 4096

  - id: togetherai:deepseek-ai/DeepSeek-R1
    config:
      temperature: 0.0
      response_format: { type: 'json_object' }
      tools:
        - type: function
          function:
            name: get_weather
            description: Get weather information
            parameters:
              type: object
              properties:
                location: { type: 'string' }
                unit: { type: 'string', enum: ['celsius', 'fahrenheit'] }

For more information, refer to the Together AI documentation.

OpenAI Compatibility​

Basic Configuration​

Key Features​

Max Tokens Configuration​

Function Calling​

JSON Mode​

Popular Models​

Llama 4 Models​

DeepSeek Models​

Llama 3 Models​

Mixtral Models​

Qwen Models​

Vision Models​

Free Endpoints​

Example Configuration​