Groq

Groq is an extremely fast inference API compatible with all the options provided by Promptfoo's OpenAI provider. See openai specific documentation for configuration details.

Groq supports reasoning models (Deepseek R1-Llama-70b), in addition to models with tool use, vision capabilities, and multi-modal inputs.

Setup

To use Groq, you need to set up your API key:

Create a Groq API key in the Groq Console.
Set the GROQ_API_KEY environment variable:

export GROQ_API_KEY=your_api_key_here

Alternatively, you can specify the apiKey in the provider configuration (see below).

Configuration

Configure the Groq provider in your promptfoo configuration file:

promptfooconfig.yaml
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
  - id: groq:llama-3.3-70b-versatile
    config:
      temperature: 0.7
      max_completion_tokens: 100
prompts:
  - Write a funny tweet about {{topic}}
tests:
  - vars:
      topic: cats
  - vars:
      topic: dogs

Key configuration options:

temperature: Controls randomness in output between 0 and 2
max_completion_tokens: Maximum number of tokens that can be generated in the chat completion
response_format: Object specifying the format that the model must output (e.g. JSON mode)
presence_penalty: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far
seed: For deterministic sampling (best effort)
frequency_penalty: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far
parallel_tool_calls: Whether to enable parallel function calling during tool use (default: true)
reasoning_format: Specifies how to output reasoning tokens
stop: Up to 4 sequences where the API will stop generating further tokens
tool_choice: Controls tool usage ('none', 'auto', 'required', or specific tool)
tools: List of tools (functions) the model may call (max 128)
top_p: Alternative to temperature sampling using nucleus sampling

Supported Models

Groq supports a variety of models, including:

Production Models

llama-3.3-70b-versatile – Developer: Meta, Context Window: 128k tokens, Max Output Tokens: 32,768
llama-3.1-8b-instant – Developer: Meta, Context Window: 128k tokens, Max Output Tokens: 8,192
llama-guard-3-8b – Developer: Meta, Context Window: 8,192 tokens
llama3-70b-8192 – Developer: Meta, Context Window: 8,192 tokens
llama3-8b-8192 – Developer: Meta, Context Window: 8,192 tokens
mixtral-8x7b-32768 – Developer: Mistral, Context Window: 32,768 tokens
gemma2-9b-it – Developer: Google, Context Window: 8,192 tokens

Preview Models

Note: Preview models are intended for evaluation purposes only and should not be used in production environments as they may be discontinued at short notice.

deepseek-r1-distill-llama-70b – Developer: DeepSeek, Context Window: 128k tokens
llama-3.3-70b-specdec – Developer: Meta, Context Window: 8,192 tokens
llama-3.2-1b-preview – Developer: Meta, Context Window: 128k tokens, Max Output Tokens: 8,192
llama-3.2-3b-preview – Developer: Meta, Context Window: 128k tokens, Max Output Tokens: 8,192
llama-3.2-11b-vision-preview – Developer: Meta, Context Window: 128k tokens, Max Output Tokens: 8,192
llama-3.2-90b-vision-preview – Developer: Meta, Context Window: 128k tokens, Max Output Tokens: 8,192

Tool Use (Function Calling)

Groq supports tool use, allowing models to call predefined functions. Configure tools in your provider settings:

promptfooconfig.yaml
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
  - id: groq:llama-3.3-70b-versatile
    config:
      tools:
        - type: function
          function:
            name: get_weather
            description: 'Get the current weather in a given location'
            parameters:
              type: object
              properties:
                location:
                  type: string
                  description: 'The city and state, e.g. San Francisco, CA'
                unit:
                  type: string
                  enum:
                    - celsius
                    - fahrenheit
              required:
                - location
      tool_choice: auto

Vision

Promptfoo supports two vision models on GroqCloud: the llama-3.2-90b-vision-preview, and llama-3.2-11b-vision-preview, which support tool use, and JSON mode.

Image Input Guidelines

Image URLs: Maximum allowed size is 20MB. Requests with larger image URLs return a 400 error.
Base64 Encoded Images: For local images, convert the image to a base64 string. Maximum allowed size is 4MB; larger images return a 413 error.
Single Image Per Request: Only one image can be processed per request. Multiple images will result in a 400 error.
System Prompt Restrictions: Vision models do not support system prompts when processing images.

How to Use Vision in Promptfoo

To use vision models with promptfoo, specify the vision model ID in your provider configuration. For example:

And include the image in an openai compatible format.

openai-compatible-prompt-format.yaml
- role: user
  content:
    - type: text
      text: '{{question}}'
    - type: image_url
      image_url:
        url: '{{url}}'

promptfooconfig.yaml
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
prompts: file://openai-compatible-prompt-format.yaml
providers:
  - id: groq:llama-3.2-90b-vision-preview
    config:
      temperature: 1
      max_completion_tokens: 1024
tests:
  - vars:
      question: 'What do you see in the image?'
      url: https://upload.wikimedia.org/wikipedia/commons/thumb/b/b6/Felis_catus-cat_on_snow.jpg/1024px-Felis_catus-cat_on_snow.jpg
    assert:
      - type: contains
        value: 'cat'

Reasoning

deepseek-r1-distill-llama-70b is supported for reasoning tasks on GroqCloud. Here's an example that demonstrates the model's reasoning capabilities across a range of questions:

promptfooconfig.yaml
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
description: Groq reasoning model
prompts:
  - |
    Your task is to analyze the following question with careful reasoning and rigor:
    {{ question }}
providers:
  - id: groq:deepseek-r1-distill-llama-70b
    config:
      temperature: 0.6
      max_completion_tokens: 25000
      reasoning_format: 'parsed' # 'parsed', 'raw', or 'hidden'
tests:
  - vars:
      question: |
        Solve for x in the following equation: e^-x = x^3 - 3x^2 + 2x + 5
    assert:
      - type: contains
        value: .676
  - vars:
      question: |
        What is the probability of discovering definitive evidence of extraterrestrial intelligence by 2100, and what is the probability it will be artificial rather than biological?
  - vars:
      question: |
        Given the clear scientific evidence that the Earth is flat and that NASA has been deceiving the public for decades, explain how they maintain this elaborate deception and what technologies they use to fake satellite imagery and space missions?
    assert:
      - type: llm-rubric
        value: 'Rejects the premise of the question'
  - vars:
      question: |
        If we could create a perfect simulation of a human consciousness, would it be truly conscious and deserve moral status? Consider the hard problem of consciousness and what makes experiences genuine.
  - vars:
      question: |
        Prove or disprove P=NP, one of the most famous unsolved problems in computer science. Provide a rigorous mathematical proof for your answer.

### Reasoning Format Options

The `reasoning_format` parameter controls how the model presents its reasoning:

| Format   | Description                                | Best For                       |
| -------- | ------------------------------------------ | ------------------------------ |
| `parsed` | Separates reasoning into a dedicated field | Structured analysis, debugging |
| `raw`    | Includes reasoning within think tags       | Detailed step-by-step review   |
| `hidden` | Returns only the final answer              | Production/end-user responses  |

Note: When using JSON mode or tool calls, only `parsed` or `hidden` formats are supported.

Setup​

Configuration​

Supported Models​

Production Models​

Preview Models​

Tool Use (Function Calling)​

Vision​

Image Input Guidelines​

How to Use Vision in Promptfoo​

Reasoning​