Select Best
The select-best
assertion compares multiple outputs in the same test case and selects the one that best meets a specified criterion. This is useful for comparing different prompt or model variations to determine which produces the best result.
How to use it
To use the select-best
assertion type, add it to your test configuration like this:
assert:
- type: select-best
value: 'choose the most concise and accurate response'
Note: This assertion requires multiple prompts or providers to generate different outputs to compare.
How it works
The select-best checker:
- Takes all outputs from the test case
- Evaluates each output against the specified criterion
- Selects the best output
- Returns pass=true for the winning output and pass=false for others
Example Configuration
Here's a complete example showing how to use select-best to compare different prompt variations:
prompts:
- 'Write a tweet about {{topic}}'
- 'Write a very concise, funny tweet about {{topic}}'
- 'Compose a tweet about {{topic}} that will go viral'
providers:
- openai:gpt-4
tests:
- vars:
topic: 'artificial intelligence'
assert:
- type: select-best
value: 'choose the tweet that is most likely to get high engagement'
- vars:
topic: 'climate change'
assert:
- type: select-best
value: 'choose the tweet that best balances information and humor'
Overriding the Grader
Like other model-graded assertions, you can override the default grader:
-
Using the CLI:
promptfoo eval --grader openai:gpt-4o-mini
-
Using test options:
defaultTest:
options:
provider: openai:gpt-4o-mini -
Using assertion-level override:
assert:
- type: select-best
value: 'choose the most engaging response'
provider: openai:gpt-4o-mini
Customizing the Prompt
You can customize the evaluation prompt using the rubricPrompt
property:
defaultTest:
options:
rubricPrompt: |
Here are {{ outputs | length }} responses:
{% for output in outputs %}
Output {{ loop.index0 }}: {{ output }}
{% endfor %}
Criteria: {{ criteria }}
Analyze each output against the criteria.
Choose the best output by responding with its index (0 to {{ outputs | length - 1 }}).
Further reading
See model-graded metrics for more options.