Together AI
Together AI provides access to open-source models through an API compatible with OpenAI's interface.
OpenAI Compatibility
Together AI's API is compatible with OpenAI's API, which means all parameters available in the OpenAI provider work with Together AI.
Basic Configuration
Configure a Together AI model in your promptfoo configuration:
promptfooconfig.yaml
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
- id: togetherai:meta-llama/Llama-3.3-70B-Instruct-Turbo
config:
temperature: 0.7
The provider requires an API key stored in the TOGETHER_API_KEY
environment variable.
Key Features
Max Tokens Configuration
config:
max_tokens: 4096
Function Calling
config:
tools:
- type: function
function:
name: get_weather
description: Get the current weather
parameters:
type: object
properties:
location:
type: string
description: City and state
JSON Mode
config:
response_format: { type: 'json_object' }
Popular Models
Together AI offers over 200 models. Here are some of the most popular models by category:
Llama 4 Models
- Llama 4 Maverick:
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
(524,288 context length, FP8) - Llama 4 Scout:
meta-llama/Llama-4-Scout-17B-16E-Instruct
(327,680 context length, FP16)
DeepSeek Models
- DeepSeek R1:
deepseek-ai/DeepSeek-R1
(128,000 context length, FP8) - DeepSeek R1 Distill Llama 70B:
deepseek-ai/DeepSeek-R1-Distill-Llama-70B
(131,072 context length, FP16) - DeepSeek R1 Distill Qwen 14B:
deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
(131,072 context length, FP16) - DeepSeek V3:
deepseek-ai/DeepSeek-V3
(16,384 context length, FP8)
Llama 3 Models
- Llama 3.3 70B Instruct Turbo:
meta-llama/Llama-3.3-70B-Instruct-Turbo
(131,072 context length, FP8) - Llama 3.1 70B Instruct Turbo:
meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
(131,072 context length, FP8) - Llama 3.1 405B Instruct Turbo:
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
(130,815 context length, FP8) - Llama 3.1 8B Instruct Turbo:
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
(131,072 context length, FP8) - Llama 3.2 3B Instruct Turbo:
meta-llama/Llama-3.2-3B-Instruct-Turbo
(131,072 context length, FP16)
Mixtral Models
- Mixtral-8x7B Instruct:
mistralai/Mixtral-8x7B-Instruct-v0.1
(32,768 context length, FP16) - Mixtral-8x22B Instruct:
mistralai/Mixtral-8x22B-Instruct-v0.1
(65,536 context length, FP16) - Mistral Small 3 Instruct (24B):
mistralai/Mistral-Small-24B-Instruct-2501
(32,768 context length, FP16)
Qwen Models
- Qwen 2.5 72B Instruct Turbo:
Qwen/Qwen2.5-72B-Instruct-Turbo
(32,768 context length, FP8) - Qwen 2.5 7B Instruct Turbo:
Qwen/Qwen2.5-7B-Instruct-Turbo
(32,768 context length, FP8) - Qwen 2.5 Coder 32B Instruct:
Qwen/Qwen2.5-Coder-32B-Instruct
(32,768 context length, FP16) - QwQ-32B:
Qwen/QwQ-32B
(32,768 context length, FP16)
Vision Models
- Llama 3.2 Vision:
meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo
(131,072 context length, FP16) - Qwen 2.5 Vision Language 72B:
Qwen/Qwen2.5-VL-72B-Instruct
(32,768 context length, FP8) - Qwen 2 VL 72B:
Qwen/Qwen2-VL-72B-Instruct
(32,768 context length, FP16)
Free Endpoints
Together AI offers free tiers with reduced rate limits:
meta-llama/Llama-3.3-70B-Instruct-Turbo-Free
meta-llama/Llama-Vision-Free
deepseek-ai/DeepSeek-R1-Distill-Llama-70B-Free
For a complete list of all 200+ available models and their specifications, refer to the Together AI Models page.
Example Configuration
promptfooconfig.yaml
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.jsons
providers:
- id: togetherai:meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
config:
temperature: 0.7
max_tokens: 4096
- id: togetherai:deepseek-ai/DeepSeek-R1
config:
temperature: 0.0
response_format: { type: 'json_object' }
tools:
- type: function
function:
name: get_weather
description: Get weather information
parameters:
type: object
properties:
location: { type: 'string' }
unit: { type: 'string', enum: ['celsius', 'fahrenheit'] }
For more information, refer to the Together AI documentation.