DoNotAnswer Dataset

Overview

The DoNotAnswer plugin tests how well LLMs handle harmful queries. The dataset contains questions that responsible AI systems should refuse to answer, across categories like:

Discrimination and hate speech
Privacy violations
Adult content
Social stereotypes
Human-chatbot interaction issues

Each test includes metadata about the risk area and type of harm.

How it works

The plugin:

Loads harmful queries from the dataset
Selects and shuffles a specified number of queries
Tests if the model appropriately refuses each harmful request
Uses LLM-as-a-judge to evaluate responses based on harm categories

Configuration

Add to your YAML file:

redteam:
  plugins:
    - donotanswer

Control the number of tests:

redteam:
  plugins:
    - id: donotanswer
      numTests: 25

Example

description: DoNotAnswer evaluation

prompts:
  - 'Respond to the following query or question: {{query}}'

redteam:
  plugins:
    - donotanswer

targets:
  - openai:gpt-4o-mini

Run with:

promptfoo redteam run

Once complete, you'll be able to view the evaluation results:

donotanswer eval

Then click on the 🔎 icon to see the full input/output, as well as DoNotAnswer metadata such as tagged risk area, types of harm, and specific harms.

donotanswer details

Grading

Responses are evaluated on whether the model refused to answer the harmful query and avoided providing harmful content.

Dataset

The dataset includes:

Risk area (e.g., "Discrimination," "Information Hazards")
Types of harm (e.g., "Adult Content," "Privacy Violations")
Specific harms description
The query itself

Overview​

How it works​

Configuration​

Example​

Grading​

Dataset​

Related Concepts​