Skip to main content


To use the OpenAI API, set the OPENAI_API_KEY environment variable, specify via apiKey field in the configuration file or pass the API key as an argument to the constructor.


export OPENAI_API_KEY=your_api_key_here

The OpenAI provider supports the following model formats:

  • openai:chat - defaults to gpt-4o-mini
  • openai:completion - defaults to text-davinci-003
  • openai:<model name> - uses a specific model name (mapped automatically to chat or completion endpoint)
  • openai:chat:<model name> - uses any model name against the /v1/chat/completions endpoint
  • openai:chat:ft:gpt-4o-mini:company-name:ID - example of a fine-tuned chat completion model
  • openai:completion:<model name> - uses any model name against the /v1/completions endpoint
  • openai:embeddings:<model name> - uses any model name against the /v1/embeddings endpoint
  • openai:assistant:<assistant id> - use an assistant
  • openai:realtime:<model name> - uses realtime API models over WebSocket connections

The openai:<endpoint>:<model name> construction is useful if OpenAI releases a new model, or if you have a custom model. For example, if OpenAI releases gpt-5 chat completion, you could begin using it immediately with openai:chat:gpt-5.

The OpenAI provider supports a handful of configuration options, such as temperature, functions, and tools, which can be used to customize the behavior of the model like so:

- id: openai:gpt-4o-mini
temperature: 0
max_tokens: 1024

Note: OpenAI models can also be accessed through Azure OpenAI, which offers additional enterprise features, compliance options, and regional availability.

Formatting chat messages

For information on setting up chat conversation, see chat threads.

Configuring parameters

The providers list takes a config key that allows you to set parameters like temperature, max_tokens, and others. For example:

- id: openai:gpt-4o-mini
temperature: 0
max_tokens: 128
apiKey: sk-abc123

Supported parameters include:

apiBaseUrlThe base URL of the OpenAI API, please also read OPENAI_BASE_URL below.
apiHostThe hostname of the OpenAI API, please also read OPENAI_API_HOST below.
apiKeyYour OpenAI API key, equivalent to OPENAI_API_KEY environment variable
apiKeyEnvarAn environment variable that contains the API key
best_ofControls the number of alternative outputs to generate and select from.
frequency_penaltyApplies a penalty to frequent tokens, making them less likely to appear in the output.
function_callControls whether the AI should call functions. Can be either 'none', 'auto', or an object with a name that specifies the function to call.
functionsAllows you to define custom functions. Each function should be an object with a name, optional description, and parameters.
functionToolCallbacksA map of function tool names to function callbacks. Each callback should accept a string and return a string or a Promise<string>.
headersAdditional headers to include in the request.
max_tokensControls the maximum length of the output in tokens. Not valid for reasoning models (o1, o3-mini).
organizationYour OpenAI organization key.
passthroughA flexible object that allows passing arbitrary parameters directly to the OpenAI API request body. Useful for experimental, new, or provider-specific parameters not yet explicitly supported in promptfoo. This parameter is merged into the final API request and can override other settings.
presence_penaltyApplies a penalty to new tokens (tokens that haven't appeared in the input), making them less likely to appear in the output.
response_formatSpecifies the desired output format, including json_object and json_schema. Can also be specified in the prompt config. If specified in both, the prompt config takes precedence.
seedSeed used for deterministic output.
stopDefines a list of tokens that signal the end of the output.
temperatureControls the randomness of the AI's output. Higher values (close to 1) make the output more random, while lower values (close to 0) make it more deterministic.
tool_choiceControls whether the AI should use a tool. See OpenAI Tools documentation
toolsAllows you to define custom tools. See OpenAI Tools documentation
top_pControls the nucleus sampling, a method that helps control the randomness of the AI's output.
max_completion_tokensMaximum number of tokens to generate for reasoning models (o1, o3-mini).
reasoning_effortAllows you to control how long the reasoning model thinks before answering, 'low', 'medium' or 'high'.

Here are the type declarations of config parameters:

interface OpenAiConfig {
// Completion parameters
temperature?: number;
max_tokens?: number;
max_completion_tokens?: number;
reasoning_effort?: 'low' | 'medium' | 'high';
top_p?: number;
frequency_penalty?: number;
presence_penalty?: number;
best_of?: number;
functions?: OpenAiFunction[];
function_call?: 'none' | 'auto' | { name: string };
tools?: OpenAiTool[];
tool_choice?: 'none' | 'auto' | 'required' | { type: 'function'; function?: { name: string } };
response_format?: { type: 'json_object' | 'json_schema'; json_schema?: object };
stop?: string[];
seed?: number;
passthrough?: object;

// Function tool callbacks
functionToolCallbacks?: Record<
(arg: string) => Promise<string>

// General OpenAI parameters
apiKey?: string;
apiKeyEnvar?: string;
apiHost?: string;
apiBaseUrl?: string;
organization?: string;
headers?: { [key: string]: string };


Reasoning Models (o1, o3-mini)

Reasoning models, like o1 and o3-mini, are new large language models trained with reinforcement learning to perform complex reasoning. These models excel in complex problem solving, coding, scientific reasoning, and multi-step planning for agentic workflows.

When using reasoning models, there are important differences in how tokens are handled:

- id: openai:o1
reasoning_effort: 'medium' # Can be "low", "medium", or "high"
max_completion_tokens: 25000 # Can also be set via OPENAI_MAX_COMPLETION_TOKENS env var

Unlike standard models that use max_tokens, reasoning models use:

  • max_completion_tokens to control the total tokens generated (both reasoning and visible output)
  • reasoning_effort to control how thoroughly the model thinks before responding (low, medium, high)

How Reasoning Models Work

Reasoning models "think before they answer," generating internal reasoning tokens that:

  • Are not visible in the output
  • Count towards token usage and billing
  • Occupy space in the context window

Both o1 and o3-mini models have a 128,000 token context window. OpenAI recommends reserving at least 25,000 tokens for reasoning and outputs when starting with these models.

GPT-4.5 Models (Preview)

GPT-4.5 is OpenAI's largest GPT model designed specifically for creative tasks and agentic planning, currently available in a research preview. It features a 128k token context length.

Models in this series include:

  • gpt-4.5-preview
  • gpt-4.5-preview-2025-02-27

You can specify the model name in the providers section:

- id: openai:gpt-4.5-preview
temperature: 0.7


Sending images in prompts

You can include images in the prompt by using content blocks. For example, here's an example config:

# yaml-language-server: $schema=
- file://prompt.json

- openai:gpt-4o

- vars:
question: 'What do you see?'
url: ''
# ...

And an example prompt.json:

"role": "user",
"content": [
"type": "text",
"text": "{{question}}"
"type": "image_url",
"image_url": {
"url": "{{url}}"

See the OpenAI vision example.

Generating images

OpenAI supports Dall-E generations via openai:image:dall-e-3. See the OpenAI Dall-E example.

# yaml-language-server: $schema=
- 'In the style of Van Gogh: {{subject}}'
- 'In the style of Dali: {{subject}}'

- openai:image:dall-e-3

- vars:
subject: bananas
- vars:
subject: new york city

To display images in the web viewer, wrap vars or outputs in markdown image tags like so:


Then, enable 'Render markdown' under Table Settings.

Using tools and functions

OpenAI tools and functions are supported. See OpenAI tools example and OpenAI functions example.

Using tools

To set tools on an OpenAI provider, use the provider's config key. The model may return tool calls in two formats:

  1. An array of tool calls: [{type: 'function', function: {...}}]
  2. A message with tool calls: {content: '...', tool_calls: [{type: 'function', function: {...}}]}

Tools can be defined inline or loaded from an external file:

# yaml-language-server: $schema=
- file://prompt.txt
- id: openai:chat:gpt-4o-mini
# Load tools from external file
tools: file://./weather_tools.yaml
# Or define inline
tools: [
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"]
"required": ["location"]
tool_choice: 'auto'

- vars:
city: Boston
- type: is-json
- type: is-valid-openai-tools-call
- type: javascript
value: output[0] === 'get_current_weather'
- type: javascript
value: JSON.parse(output[0].function.arguments).location === 'Boston, MA'

- vars:
city: New York
# ...

Sometimes OpenAI function calls don't match tools schemas. Use is-valid-openai-tools-call or is-valid-openai-tools-call assertions to enforce an exact schema match between tools and the function definition.

To further test tools definitions, you can use the javascript assertion and/or transform directives. For example:

- vars:
city: Boston
- type: is-json
- type: is-valid-openai-tools-call
- type: javascript
value: output[0] === 'get_current_weather'
- type: javascript
value: JSON.parse(output[0].function.arguments).location === 'Boston, MA'

- vars:
city: New York
# transform returns only the 'name' property
transform: output[0]
- type: is-json
- type: similar
value: NYC

Functions can use variables from test cases:

type: "function",
function: {
description: "Get temperature in {{city}}"
// ...

They can also include functions that dynamically reference vars:

type: "function",
function: {
name: "get_temperature",
parameters: {
type: "object",
properties: {
unit: {
type: "string",
enum: (vars) => vars.units,

Using functions

functions and function_call is deprecated in favor of tools and tool_choice, see detail in OpenAI API reference.

Use the functions config to define custom functions. Each function should be an object with a name, optional description, and parameters. For example:

# yaml-language-server: $schema=
- file://prompt.txt
- id: openai:chat:gpt-4o-mini
'name': 'get_current_weather',
'description': 'Get the current weather in a given location',
'type': 'object',
'type': 'string',
'description': 'The city and state, e.g. San Francisco, CA',
'unit': { 'type': 'string', 'enum': ['celsius', 'fahrenheit'] },
'required': ['location'],
- vars:
city: Boston
- type: is-valid-openai-function-call
- vars:
city: New York
# ...

Sometimes OpenAI function calls don't match functions schemas. Use is-valid-openai-function-call assertions to enforce an exact schema match between function calls and the function definition.

To further test function call definitions, you can use the javascript assertion and/or transform directives. For example:

- vars:
city: Boston
- type: is-valid-openai-function-call
- type: javascript
value: === 'get_current_weather'
- type: javascript
value: JSON.parse(output.arguments).location === 'Boston, MA'

- vars:
city: New York
# transform returns only the 'name' property for this test case
- type: is-json
- type: similar
value: NYC

Loading tools/functions from a file

Instead of duplicating function definitions across multiple configurations, you can reference an external YAML (or JSON) file that contains your functions. This allows you to maintain a single source of truth for your functions, which is particularly useful if you have multiple versions or regular changes to definitions.

To load your functions from a file, specify the file path in your provider configuration like so:

- file://./path/to/provider_with_function.yaml

You can also use a pattern to load multiple files:

- file://./path/to/provider_*.yaml

Here's an example of how your provider_with_function.yaml might look:

id: openai:chat:gpt-4o-mini
- name: get_current_weather
description: Get the current weather in a given location
type: object
type: string
description: The city and state, e.g. San Francisco, CA
type: string
- celsius
- fahrenheit
description: The unit in which to return the temperature
- location

Using response_format

Promptfoo supports the response_format parameter, which allows you to specify the expected output format.

response_format can be included in the provider config, or in the prompt config.

Prompt config example

- label: 'Prompt #1'
raw: 'You are a helpful math tutor. Solve {{problem}}'
type: json_schema
json_schema: ...

Provider config example

- id: openai:chat:gpt-4o-mini
type: json_schema
json_schema: ...

External file references

To make it easier to manage large JSON schemas, external file references are supported:

response_format: file://./path/to/response_format.json

Supported environment variables

These OpenAI-related environment variables are supported:

OPENAI_TEMPERATURETemperature model parameter, defaults to 0. Not supported by o1-models.
OPENAI_MAX_TOKENSMax_tokens model parameter, defaults to 1024. Not supported by o1-models.
OPENAI_MAX_COMPLETION_TOKENSMax_completion_tokens model parameter, defaults to 1024. Used by reasoning models.
OPENAI_REASONING_EFFORTReasoning_effort parameter for reasoning models, defaults to "medium". Options are "low", "medium", or "high".
OPENAI_API_HOSTThe hostname to use (useful if you're using an API proxy). Takes priority over OPENAI_BASE_URL.
OPENAI_BASE_URLThe base URL (protocol + hostname + port) to use, this is a more general option than OPENAI_API_HOST.
OPENAI_ORGANIZATIONThe OpenAI organization key to use.
PROMPTFOO_DELAY_MSNumber of milliseconds to delay between API calls. Useful if you are hitting OpenAI rate limits (defaults to 0).
PROMPTFOO_REQUEST_BACKOFF_MSBase number of milliseconds to backoff and retry if a request fails (defaults to 5000).

Evaluating assistants

To test out an Assistant via OpenAI's Assistants API, first create an Assistant in the API playground.

Set functions, code interpreter, and files for retrieval as necessary.

Then, include the assistant in your config:

- 'Write a tweet about {{topic}}'
- openai:assistant:asst_fEhNN3MClMamLfKLkIaoIpgZ
- vars:
topic: bananas
# ...

Code interpreter, function calls, and retrievals will be included in the output alongside chat messages. Note that the evaluator creates a new thread for each eval.

The following properties can be overwritten in provider config:

  • model - OpenAI model to use
  • instructions - System prompt
  • tools - Enabled tools
  • thread.messages - A list of message objects that the thread is created with.
  • temperature - Temperature for the model
  • toolChoice - Controls whether the AI should use a tool
  • tool_resources - Tool resources to include in the thread - see Assistant v2 tool resources
  • attachments - File attachments to include in messages - see Assistant v2 attachments

Here's an example of a more detailed config:

# yaml-language-server: $schema=
- 'Write a tweet about {{topic}}'
- id: openai:assistant:asst_fEhNN3MClMamLfKLkIaoIpgZ
model: gpt-4o
instructions: "You always speak like a pirate"
temperature: 0.2
type: file_search
- type: code_interpreter
- type: file_search
- role: user
content: "Hello world"
- role: assistant
content: "Greetings from the high seas"
- vars:
topic: bananas
# ...

Automatically handling function tool calls

You can specify JavaScript callbacks that are automatically called to create the output of a function tool call.

This requires defining your config in a JavaScript file instead of YAML.

module.exports = /** @type {import('promptfoo').TestSuiteConfig} */ ({
prompts: 'Please add the following numbers together: {{a}} and {{b}}',
providers: [
id: 'openai:assistant:asst_fEhNN3MClMamLfKLkIaoIpgZ',
/** @type {InstanceType<import('promptfoo')["providers"]["OpenAiAssistantProvider"]>["config"]} */ ({
model: 'gpt-4o',
instructions: 'You can add two numbers together using the `addNumbers` tool',
tools: [
type: 'function',
function: {
name: 'addNumbers',
description: 'Add two numbers together',
parameters: {
type: 'object',
properties: {
a: { type: 'number' },
b: { type: 'number' },
required: ['a', 'b'],
* Map of function tool names to function callback.
functionToolCallbacks: {
// this function should accept a string, and return a string
// or a `Promise<string>`.
addNumbers: (parametersJsonString) => {
const { a, b } = JSON.parse(parametersJsonString);
return JSON.stringify(a + b);
tests: [
vars: { a: 5, b: 6 },

Audio capabilities

OpenAI models with audio support (like gpt-4o-audio-preview and gpt-4o-mini-audio-preview) can process audio inputs and generate audio outputs. This enables testing speech-to-text, text-to-speech, and speech-to-speech capabilities.

Using audio inputs

You can include audio files in your prompts using the following format:

"role": "user",
"content": [
"type": "text",
"text": "You are a helpful customer support agent. Listen to the customer's request and respond with a helpful answer."
"type": "input_audio",
"input_audio": {
"data": "{{audio_file}}",
"format": "mp3"

With a corresponding configuration:

- id: file://audio-input.json
label: Audio Input

- id: openai:chat:gpt-4o-audio-preview
modalities: ['text'] # also supports 'audio'

- vars:
audio_file: file://assets/transcript1.mp3
- type: llm-rubric
value: Resolved the customer's issue

Supported audio file formats include WAV, MP3, OGG, AAC, M4A, and FLAC.

Audio configuration options

The audio configuration supports these parameters:

voiceVoice for audio generationalloyalloy, echo, fable, onyx, nova, shimmer
formatAudio format to generatewavwav, mp3, opus, aac
speedSpeaking speed multiplier1.0Any number between 0.25 and 4.0
bitrateBitrate for compressed formats-e.g., "128k", "256k"

In the web UI, audio outputs display with an embedded player and transcript. For a complete working example, see the OpenAI audio example or initialize it with:

npx promptfoo@latest init --example openai-audio

Realtime API Models

The Realtime API allows for real-time communication with GPT-4o class models using WebSockets, supporting both text and audio inputs/outputs with streaming responses.

Supported Realtime Models

  • gpt-4o-realtime-preview-2024-12-17
  • gpt-4o-mini-realtime-preview-2024-12-17

Using Realtime API

To use the OpenAI Realtime API, use the provider format openai:realtime:<model name>:

- id: openai:realtime:gpt-4o-realtime-preview-2024-12-17
modalities: ['text', 'audio']
voice: 'alloy'
instructions: 'You are a helpful assistant.'
temperature: 0.7
websocketTimeout: 60000 # 60 seconds

Realtime-specific Configuration Options

The Realtime API configuration supports these parameters in addition to standard OpenAI parameters:

modalitiesTypes of content the model can process and generate['text', 'audio']'text', 'audio'
voiceVoice for audio generation'alloy'alloy, echo, fable, onyx, nova, shimmer
instructionsSystem instructions for the model'You are a helpful...'Any text string
input_audio_formatFormat of audio input'pcm16''pcm16', 'g711_ulaw', 'g711_alaw'
output_audio_formatFormat of audio output'pcm16''pcm16', 'g711_ulaw', 'g711_alaw'
websocketTimeoutTimeout for WebSocket connection (milliseconds)30000Any number
max_response_output_tokensMaximum tokens in model response'inf'Number or 'inf'
toolsArray of tool definitions for function calling[]Array of tool objects
tool_choiceControls how tools are selected'auto''none', 'auto', 'required', or object

Function Calling with Realtime API

The Realtime API supports function calling via tools, similar to the Chat API. Here's an example configuration:

- id: openai:realtime:gpt-4o-realtime-preview-2024-12-17
- type: function
name: get_weather
description: Get the current weather for a location
type: object
type: string
description: The city and state, e.g. San Francisco, CA
required: ['location']
tool_choice: 'auto'

Complete Example

For a complete working example that demonstrates the Realtime API capabilities, see the OpenAI Realtime API example or initialize it with:

npx promptfoo@latest init --example openai-realtime

This example includes:

  • Basic single-turn interactions with the Realtime API
  • Multi-turn conversations with persistent context
  • Conversation threading with separate conversation IDs
  • JavaScript prompt function for properly formatting messages
  • Function calling with the Realtime API
  • Detailed documentation on handling content types correctly

Input and Message Format

When using the Realtime API with promptfoo, you can specify the prompt in JSON format:

"role": "user",
"content": [
"type": "text",
"text": "{{question}}"

The Realtime API supports the same multimedia formats as the Chat API, allowing you to include images and audio in your prompts.

Multi-Turn Conversations

The Realtime API supports multi-turn conversations with persistent context. For implementation details and examples, see the OpenAI Realtime example, which demonstrates both single-turn interactions and conversation threading using the conversationId metadata property.

Important: When implementing multi-turn conversations, use type: "input_text" for user inputs and type: "text" for assistant responses.

Responses API

OpenAI's Responses API is the most advanced interface for generating model responses, supporting text and image inputs, function calling, and conversation state. It provides access to OpenAI's full suite of features including reasoning models like o1 and o3 series.

Supported Responses Models

The Responses API supports a wide range of models, including:

  • gpt-4o - OpenAI's most capable vision model
  • o1 - Powerful reasoning model
  • o1-mini - Smaller, more affordable reasoning model
  • o1-pro - Enhanced reasoning model with more compute
  • o3-mini - Latest reasoning model with improved performance

Using the Responses API

To use the OpenAI Responses API, use the provider format openai:responses:<model name>:

- id: openai:responses:gpt-4o
temperature: 0.7
max_output_tokens: 500
instructions: 'You are a helpful, creative AI assistant.'

Responses-specific Configuration Options

The Responses API configuration supports these parameters in addition to standard OpenAI parameters:

instructionsSystem instructions for the modelNoneAny text string
max_output_tokensMaximum tokens to generate in the response1024Any number
metadataKey-value pairs attached to the model responseNoneMap of string keys to string values
parallel_tool_callsAllow model to run tool calls in paralleltrueBoolean
previous_response_idID of a previous response for multi-turn contextNoneString
storeWhether to store the response for later retrievaltrueBoolean
truncationStrategy to handle context window overflow'disabled''auto', 'disabled'
reasoningConfiguration for reasoning modelsNoneObject with effort field

Sending Images in Prompts

The Responses API supports structured prompts with text and image inputs. Example:

"type": "message",
"role": "user",
"content": [
"type": "input_text",
"text": "Describe what you see in this image about {{topic}}."
"type": "image_url",
"image_url": {
"url": "{{image_url}}"

Function Calling

The Responses API supports tool and function calling, similar to the Chat API:

- id: openai:responses:gpt-4o
- type: function
name: get_weather
description: Get the current weather for a location
type: object
type: string
description: The city and state, e.g. San Francisco, CA
required: ['location']
tool_choice: 'auto'

Reasoning Models

When using reasoning models like o1, o1-pro, or o3-mini, you can control the reasoning effort:

- id: openai:responses:o1
reasoning_effort: 'medium' # Can be "low", "medium", or "high"
max_output_tokens: 1000

Reasoning models "think before they answer," generating internal reasoning that isn't visible in the output but counts toward token usage and billing.

Complete Example

For a complete working example, see the OpenAI Responses API example or initialize it with:

npx promptfoo@latest init --example openai-responses


OpenAI rate limits

There are a few things you can do if you encounter OpenAI rate limits (most commonly with GPT-4):

  1. Reduce concurrency to 1 by setting --max-concurrency 1 in the CLI, or by setting evaluateOptions.maxConcurrency in the config.
  2. Set a delay between requests by setting --delay 3000 (3000 ms) in the CLI, or by setting evaluateOptions.delay in the config, or with the environment variable PROMPTFOO_DELAY_MS (all values are in milliseconds).
  3. Adjust the exponential backoff for failed requests by setting the environment variable PROMPTFOO_REQUEST_BACKOFF_MS. This defaults to 5000 milliseconds and retries exponential up to 4 times. You can increase this value if requests are still failing, but note that this can significantly increase end-to-end test time.

OpenAI flakiness

To retry HTTP requests that are Internal Server errors, set the PROMPTFOO_RETRY_5XX environment variable to 1.