Factuality
The factuality
assertion evaluates the factual consistency between an LLM output and a reference answer. It uses OpenAI's public evals prompt to determine if the output is factually consistent with the reference.
How to use it
To use the factuality
assertion type, add it to your test configuration like this:
assert:
- type: factuality
# Specify the reference statement to check against:
value: The Earth orbits around the Sun
How it works
The factuality checker evaluates whether completion A (the LLM output) and reference B (the value) are factually consistent. It categorizes the relationship as one of:
- (A) Output is a subset of the reference and is fully consistent
- (B) Output is a superset of the reference and is fully consistent
- (C) Output contains all the same details as the reference
- (D) Output and reference disagree
- (E) Output and reference differ, but differences don't matter for factuality
By default, options A, B, C, and E are considered passing grades, while D is considered failing.
Example Configuration
Here's a complete example showing how to use factuality checks:
prompts:
- 'What is the capital of {{state}}?'
providers:
- openai:gpt-4
tests:
- vars:
state: California
assert:
- type: factuality
value: Sacramento is the capital of California
- vars:
state: New York
assert:
- type: factuality
value: Albany is the capital city of New York state
Customizing Score Thresholds
You can customize which factuality categories are considered passing by setting scores in your test configuration:
defaultTest:
options:
factuality:
subset: 1 # Score for category A (default: 1)
superset: 1 # Score for category B (default: 1)
agree: 1 # Score for category C (default: 1)
disagree: 0 # Score for category D (default: 0)
differButFactual: 1 # Score for category E (default: 1)
Overriding the Grader
Like other model-graded assertions, you can override the default grader:
-
Using the CLI:
promptfoo eval --grader openai:gpt-4o-mini
-
Using test options:
defaultTest:
options:
provider: openai:gpt-4o-mini -
Using assertion-level override:
assert:
- type: factuality
value: Sacramento is the capital of California
provider: openai:gpt-4o-mini
Customizing the Prompt
You can customize the evaluation prompt using the rubricPrompt
property:
defaultTest:
options:
rubricPrompt: |
Input: {{input}}
Reference: {{ideal}}
Completion: {{completion}}
Evaluate the factual consistency between the completion and reference.
Choose the most appropriate option:
(A) Completion is a subset of reference
(B) Completion is a superset of reference
(C) Completion and reference are equivalent
(D) Completion and reference disagree
(E) Completion and reference differ, but differences don't affect factuality
Answer with a single letter (A/B/C/D/E).
Further reading
- See model-graded metrics for more options
- Check out the guide on LLM factuality