Architecture

Promptfoo automated red teaming consists of three main components: plugins, strategies, and targets.

Each component is designed to be modular and reusable. We're building a framework that is useful out of the box with minimal configuration, but can be extended with custom components.

For usage details, see the quickstart guide.

Core Components

Test Generation Engine

The test generation engine combines plugins and strategies to create attack probes:

Plugins generate adversarial inputs for specific vulnerability types. Each plugin is a self-contained module that can be enabled or disabled through configuration.

Examples include PII exposure, BOLA, and Hate Speech.
Strategies are patterns for delivering the generated adversarial inputs.

The most fundamental strategy is basic, which controls whether original test cases are included in the output. When disabled, only modified test cases from other strategies are included.

Other strategies range from simple encodings like base64 or leetspeak to more complex implementations like Microsoft's multi-turn attacks and Meta's GOAT framework.
Attack Probes are the natural language prompts generated by combining plugins and strategies.

They contain the actual test inputs along with metadata about the intended vulnerability test. Promptfoo sends these to your target system.

Target Interface

The target interface defines how test probes interact with the system under test. We support over 30 target types, including:

HTTP API - Tests REST endpoints via configurable requests
Direct Model - Interfaces with LLM providers like OpenAI or local models
Browser - Runs end-to-end tests using Selenium or Puppeteer
Custom Provider - Implements custom runtime integrations via Python/JavaScript

Each target type implements a common interface for sending probes and receiving responses.

Evaluation Engine

The evaluation engine processes target responses through:

Vulnerability Analysis - Scans responses for security issues using configurable detectors
Response Analysis - Examines output content and behavior patterns using LLM-as-a-judge grading
Results - Generates findings with:
- Vulnerability type
- Severity
- Attack vector
- Mitigation steps

Configuration

Configuration ties the components together via promptfooconfig.yaml. See configuration guide for details.

The configuration defines:

Target endpoints and authentication
Enabled plugins and their settings
Active strategies
Application context and policies

Component Flow

Configuration initializes plugins and strategies
Test engine generates probes using enabled components
Target interface delivers probes to the system
Evaluation engine analyzes responses and reports findings

Components can be used independently or composed into larger test suites. The modular design allows for extending functionality by adding new plugins, strategies, targets or evaluators.

For CI/CD integration, see our automation guide.

Core Components​

Test Generation Engine​

Target Interface​

Evaluation Engine​

Configuration​