Skip to main content

Iterate on LLMs faster

Measure LLM quality improvements and catch regressions

Used by developers at

How it works

Create a test dataset

Use a representative sample of user inputs to reduce subjectivity when tuning prompts.

Set up evaluation metrics

Use built-in metrics, LLM-graded evals, or define your own custom metrics.

Select the best prompt & model

Compare prompts and model outputs side-by-side, or integrate the library into your existing test/CI workflow.

Web Viewer

Command line

promptfoo is used by LLM apps serving over 10 million users