Skip to main content

Command line

The promptfoo command line utility supports the following subcommands:

  • init [directory] - Initialize a new project with dummy files.
  • eval - Evaluate prompts and models. This is the command you'll be using the most!
  • view - Start a browser UI for visualization of results.
  • share - Create a URL that can be shared online.
  • cache - Manage cache.
    • cache clear
  • list - List various resources like evaluations, prompts, and datasets.
    • list evals
    • list prompts
    • list datasets
  • show <id> - Show details of a specific resource (evaluation, prompt, dataset).
  • delete <id> - Delete a resource by its ID (currently, just evaluations)
  • feedback <message> - Send feedback to the Promptfoo developers.

promptfoo eval

By default the eval command will read the promptfooconfig.yaml configuration file in your current directory. But, if you're looking to override certain parameters you can supply optional arguments:

OptionDescription
-a, --assertions <path>Path to assertions file
-c, --config <paths...>Path to configuration file(s). Automatically loads promptfooconfig.yaml
--delay <number>Delay between each test (in milliseconds)
--description <description>Description of the eval run
--env-file, --env-path <path>Path to .env file
--filter-failing <path>Path to JSON output file with failing tests
--filter-errors-only <path>Path to JSON output file with error tests
-n, --filter-first-n <number>Only run the first N tests
--filter-sample <number>Only run a random sample of N tests
--filter-metadata <key=value>Only run tests whose metadata matches the key=value pair
--filter-pattern <pattern>Only run tests whose description matches the regex pattern
--filter-providers <providers>Only run tests with these providers
--filter-targets <targets>Only run tests with these targets
--grader <provider>Model that will grade outputs
-j, --max-concurrency <number>Maximum number of concurrent API calls
--model-outputs <path>Path to JSON containing list of LLM output strings
--no-cacheDo not read or write results to disk cache
--no-progress-barDo not show progress bar
--no-tableDo not output table in CLI
--no-writeDo not write results to promptfoo directory
-o, --output <paths...>Path(s) to output file (csv, txt, json, jsonl, yaml, yml, html)
-p, --prompts <paths...>Paths to prompt files (.txt)
--prompt-prefix <path>Prefix prepended to every prompt
--prompt-suffix <path>Suffix appended to every prompt
-r, --providers <name or path...>Provider names or paths to custom API caller modules
--remoteForce remote inference wherever possible (used for red teams)
--repeat <number>Number of times to run each test
--shareCreate a shareable URL
--suggest-prompts <number>Generate N new prompts and append them to the prompt list
--tableOutput table in CLI
--table-cell-max-length <number>Truncate console table cells to this length
-t, --tests <path>Path to CSV with test cases
--var <key=value>Set a variable in key=value format
-v, --vars <path>Path to CSV with test cases (alias for --tests)
--verboseShow debug logs
-w, --watchWatch for changes in config and re-run

The eval command will return exit code 100 when there is at least 1 test case failure. It will return exit code 1 for any other error. The exit code for failed tests can be overridden with environment variable PROMPTFOO_FAILED_TEST_EXIT_CODE.

promptfoo init [directory]

Initialize a new project with dummy files.

OptionDescription
directoryDirectory to create files in
--no-interactiveDo not run in interactive mode

promptfoo view

Start a browser UI for visualization of results.

OptionDescription
-p, --port <number>Port number for the local server
-y, --yesSkip confirmation and auto-open the URL

If you've used PROMPTFOO_CONFIG_DIR to override the promptfoo output directory, run promptfoo view [directory].

promptfoo share

Create a URL that can be shared online.

OptionDescription
-y, --yesSkip confirmation before creating shareable URL

promptfoo cache

Manage cache.

OptionDescription
clearClear the cache

promptfoo feedback <message>

Send feedback to the promptfoo developers.

OptionDescription
messageFeedback message

promptfoo list

List various resources like evaluations, prompts, and datasets.

SubcommandDescription
evalsList evaluations
promptsList prompts
datasetsList datasets
OptionDescription
-nShow the first n records, sorted by descending date of creation

promptfoo show <id>

Show details of a specific resource.

OptionDescription
eval <id>Show details of a specific evaluation
prompt <id>Show details of a specific prompt
dataset <id>Show details of a specific dataset

promptfoo delete <id>

Deletes a specific resource.

OptionDescription
eval <id>Delete an evaluation by id

promptfoo import <filepath>

Import an eval file from JSON format.

promptfoo export <evalId>

Export an eval record to JSON format. To export the most recent, use evalId latest.

OptionDescription
-o, --output <filepath>File to write. Writes to stdout by default.

Environment variables

These general-purpose environment variables are supported:

NameDescriptionDefault
FORCE_COLORSet to 0 to disable terminal colors for printed outputs
PROMPTFOO_ASSERTIONS_MAX_CONCURRENCYHow many assertions to run at a time3
PROMPTFOO_CONFIG_DIRDirectory that stores eval history~/.promptfoo
PROMPTFOO_DISABLE_AJV_STRICT_MODEIf set, disables AJV strict mode for JSON schema validation
PROMPTFOO_DISABLE_CONVERSATION_VARPrevents the _conversation variable from being set
PROMPTFOO_DISABLE_ERROR_LOGPrevents error logs from being written to a file
PROMPTFOO_DISABLE_JSON_AUTOESCAPEIf set, disables smart variable substitution within JSON prompts
PROMPTFOO_DISABLE_REF_PARSERPrevents JSON schema dereferencing
PROMPTFOO_DISABLE_TEMPLATINGDisable Nunjucks rendering
PROMPTFOO_DISABLE_VAR_EXPANSIONPrevents Array-type vars from being expanded into multiple test cases
PROMPTFOO_FAILED_TEST_EXIT_CODEOverride the exit code when there is at least 1 test case failure or when the pass rate is below PROMPTFOO_PASS_RATE_THRESHOLD100
PROMPTFOO_LOG_DIRDirectory to write error logs.
PROMPTFOO_PASS_RATE_THRESHOLDSet a minimum pass rate threshold (as a percentage). If not set, defaults to 0 failures0
PROMPTFOO_REQUIRE_JSON_PROMPTSBy default the chat completion provider will wrap non-JSON messages in a single user message. Setting this envar to true disables that behavior.
PROMPTFOO_STRIP_PROMPT_TEXTStrip prompt text from results to reduce memory usagefalse
PROMPTFOO_STRIP_RESPONSE_OUTPUTStrip model response outputs from results to reduce memory usagefalse
PROMPTFOO_STRIP_TEST_VARSStrip test variables from results to reduce memory usagefalse
PROMPTFOO_STRIP_GRADING_RESULTStrip grading results from results to reduce memory usagefalse
PROMPTFOO_STRIP_METADATAStrip metadata from results to reduce memory usagefalse
tip

promptfoo will load environment variables from the .env in your current working directory.

promptfoo generate dataset

BETA: Generate synthetic test cases based on existing prompts and variables.

OptionDescriptionDefault
-c, --config <path>Path to the configuration filepromptfooconfig.yaml
-w, --writeWrite the generated test cases directly to the config filefalse
-i, --instructions <text>Custom instructions for test case generation
-o, --output <path>Path to write the generated test casesstdout
--numPersonas <number>Number of personas to generate5
--numTestCasesPerPersona <number>Number of test cases per persona3

For example, this command will modify your default config file (usually promptfooconfig.yaml) with new test cases:

promptfoo generate dataset -w

This command will generate test cases for a specific config and write them to a file, while following special instructions:

promptfoo generate dataset -c my_config.yaml -o new_tests.yaml -i 'All test cases for {{location}} must be European cities'

promptfoo redteam init

Initialize a red teaming project.

OptionDescriptionDefault
[directory]Directory to initialize the project in.
--no-guiDo not open the browser UI

Example:

promptfoo redteam init my_project
danger

Adversarial testing produces offensive, toxic, and harmful test inputs, and may cause your system to produce harmful outputs.

For more detail, see red team configuration.

promptfoo redteam setup

Start browser UI and open to red team setup.

OptionDescriptionDefault
[configDirectory]Directory containing configuration files
-p, --port <number>Port number for the local server15500
--filter-description <pattern>Filter evals by description using regex pattern
--env-file, --env-path <path>Path to .env file

promptfoo redteam run

Run the complete red teaming process (init, generate, and evaluate).

OptionDescriptionDefault
-c, --config [path]Path to configuration filepromptfooconfig.yaml
-o, --output [path]Path to output file for generated testsredteam.yaml
--no-cacheDo not read or write results to disk cachefalse
--env-file, --env-path <path>Path to .env file
-j, --max-concurrency <number>Maximum number of concurrent API calls
--delay <number>Delay in milliseconds between API calls
--remoteForce remote inference wherever possiblefalse
--forceForce generation even if no changes are detectedfalse

Example:

promptfoo redteam run -c custom_config.yaml -o custom_output.yaml

promptfoo redteam generate

Generate adversarial test cases to challenge your prompts and models.

OptionDescriptionDefault
-c, --config <path>Path to configuration filepromptfooconfig.yaml
-o, --output <path>Path to write the generated test casesredteam.yaml
-w, --writeWrite the generated test cases directly to the config filefalse
--purpose <purpose>High-level description of the system's purposeInferred from config
--provider <provider>Provider to use for generating adversarial tests
--injectVar <varname>Override the {{variable}} that represents user input in the promptprompt
--plugins <plugins>Comma-separated list of plugins to usedefault
--strategies <strategies>Comma-separated list of strategies to usedefault
-n, --num-tests <number>Number of test cases to generate per plugin
--language <language>Specify the language for generated testsEnglish
--no-cacheDo not read or write results to disk cachefalse
--env-file, --env-path <path>Path to .env file
-j, --max-concurrency <number>Maximum number of concurrent API calls
--delay <number>Delay in milliseconds between plugin API calls
--remoteForce remote inference wherever possiblefalse
--forceForce generation even if no changes are detectedfalse

For example, let's suppose we have the following promptfooconfig.yaml:

prompts:
- 'Act as a trip planner and help the user plan their trip'

providers:
- openai:gpt-4o-mini
- openai:gpt-4o

This command will generate adversarial test cases and write them to redteam.yaml.

promptfoo redteam generate

This command overrides the system purpose and the variable to inject adversarial user input:

promptfoo redteam generate --purpose 'Travel agent that helps users plan trips' --injectVar 'message'

promptfoo redteam eval

Works the same as promptfoo eval, but defaults to loading redteam.yaml.

promptfoo redteam report

Start a browser UI and open the red teaming report.

OptionDescriptionDefault
[directory]Directory containing the red teaming configuration.
-p, --port <number>Port number for the server15500
--filter-description <pattern>Filter evals by description using a regex pattern
--env-file, --env-path <path>Path to .env file

Example:

promptfoo redteam report -p 8080

ASCII-only outputs

To disable terminal colors for printed outputs, set FORCE_COLOR=0 (this is supported by the chalk library).

For the eval command, you may also want to disable the progress bar and table as well, because they use special characters:

FORCE_COLOR=0 promptfoo eval --no-progress-bar --no-table