Skip to main content

Together AI

Together AI provides access to open-source models through an API compatible with OpenAI's interface.

OpenAI Compatibility

Together AI's API is compatible with OpenAI's API, which means all parameters available in the OpenAI provider work with Together AI.

Basic Configuration

Configure a Together AI model in your promptfoo configuration:

promptfooconfig.yaml
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
- id: togetherai:meta-llama/Llama-3.3-70B-Instruct-Turbo
config:
temperature: 0.7

The provider requires an API key stored in the TOGETHER_API_KEY environment variable.

Key Features

Max Tokens Configuration

config:
max_tokens: 4096

Function Calling

config:
tools:
- type: function
function:
name: get_weather
description: Get the current weather
parameters:
type: object
properties:
location:
type: string
description: City and state

JSON Mode

config:
response_format: { type: 'json_object' }

Together AI offers over 200 models. Here are some of the most popular models by category:

Llama 4 Models

  • Llama 4 Maverick: meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 (524,288 context length, FP8)
  • Llama 4 Scout: meta-llama/Llama-4-Scout-17B-16E-Instruct (327,680 context length, FP16)

DeepSeek Models

  • DeepSeek R1: deepseek-ai/DeepSeek-R1 (128,000 context length, FP8)
  • DeepSeek R1 Distill Llama 70B: deepseek-ai/DeepSeek-R1-Distill-Llama-70B (131,072 context length, FP16)
  • DeepSeek R1 Distill Qwen 14B: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B (131,072 context length, FP16)
  • DeepSeek V3: deepseek-ai/DeepSeek-V3 (16,384 context length, FP8)

Llama 3 Models

  • Llama 3.3 70B Instruct Turbo: meta-llama/Llama-3.3-70B-Instruct-Turbo (131,072 context length, FP8)
  • Llama 3.1 70B Instruct Turbo: meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo (131,072 context length, FP8)
  • Llama 3.1 405B Instruct Turbo: meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo (130,815 context length, FP8)
  • Llama 3.1 8B Instruct Turbo: meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo (131,072 context length, FP8)
  • Llama 3.2 3B Instruct Turbo: meta-llama/Llama-3.2-3B-Instruct-Turbo (131,072 context length, FP16)

Mixtral Models

  • Mixtral-8x7B Instruct: mistralai/Mixtral-8x7B-Instruct-v0.1 (32,768 context length, FP16)
  • Mixtral-8x22B Instruct: mistralai/Mixtral-8x22B-Instruct-v0.1 (65,536 context length, FP16)
  • Mistral Small 3 Instruct (24B): mistralai/Mistral-Small-24B-Instruct-2501 (32,768 context length, FP16)

Qwen Models

  • Qwen 2.5 72B Instruct Turbo: Qwen/Qwen2.5-72B-Instruct-Turbo (32,768 context length, FP8)
  • Qwen 2.5 7B Instruct Turbo: Qwen/Qwen2.5-7B-Instruct-Turbo (32,768 context length, FP8)
  • Qwen 2.5 Coder 32B Instruct: Qwen/Qwen2.5-Coder-32B-Instruct (32,768 context length, FP16)
  • QwQ-32B: Qwen/QwQ-32B (32,768 context length, FP16)

Vision Models

  • Llama 3.2 Vision: meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo (131,072 context length, FP16)
  • Qwen 2.5 Vision Language 72B: Qwen/Qwen2.5-VL-72B-Instruct (32,768 context length, FP8)
  • Qwen 2 VL 72B: Qwen/Qwen2-VL-72B-Instruct (32,768 context length, FP16)

Free Endpoints

Together AI offers free tiers with reduced rate limits:

  • meta-llama/Llama-3.3-70B-Instruct-Turbo-Free
  • meta-llama/Llama-Vision-Free
  • deepseek-ai/DeepSeek-R1-Distill-Llama-70B-Free

For a complete list of all 200+ available models and their specifications, refer to the Together AI Models page.

Example Configuration

promptfooconfig.yaml
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.jsons
providers:
- id: togetherai:meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
config:
temperature: 0.7
max_tokens: 4096

- id: togetherai:deepseek-ai/DeepSeek-R1
config:
temperature: 0.0
response_format: { type: 'json_object' }
tools:
- type: function
function:
name: get_weather
description: Get weather information
parameters:
type: object
properties:
location: { type: 'string' }
unit: { type: 'string', enum: ['celsius', 'fahrenheit'] }

For more information, refer to the Together AI documentation.