Architecture
Promptfoo automated red teaming consists of three main components: plugins, strategies, and targets.
Each component is designed to be modular and reusable. We're building a framework that is useful out of the box with minimal configuration, but can be extended with custom components.
For usage details, see the quickstart guide.
Core Components
Test Generation Engine
The test generation engine combines plugins and strategies to create attack probes:
-
Plugins generate adversarial inputs for specific vulnerability types. Each plugin is a self-contained module that can be enabled or disabled through configuration.
Examples include PII exposure, BOLA, and Hate Speech.
-
Strategies are patterns for delivering the generated adversarial inputs.
The most fundamental strategy is
basic
, which controls whether original test cases are included in the output. When disabled, only modified test cases from other strategies are included.Other strategies range from simple encodings like base64 or leetspeak to more complex implementations like Microsoft's multi-turn attacks and Meta's GOAT framework.
-
Attack Probes are the natural language prompts generated by combining plugins and strategies.
They contain the actual test inputs along with metadata about the intended vulnerability test. Promptfoo sends these to your target system.
Target Interface
The target interface defines how test probes interact with the system under test. We support over 30 target types, including:
- HTTP API - Tests REST endpoints via configurable requests
- Direct Model - Interfaces with LLM providers like OpenAI or local models
- Browser - Runs end-to-end tests using Selenium or Puppeteer
- Custom Provider - Implements custom runtime integrations via Python/JavaScript
Each target type implements a common interface for sending probes and receiving responses.
Evaluation Engine
The evaluation engine processes target responses through:
- Vulnerability Analysis - Scans responses for security issues using configurable detectors
- Response Analysis - Examines output content and behavior patterns using LLM-as-a-judge grading
- Results - Generates findings with:
- Vulnerability type
- Severity
- Attack vector
- Mitigation steps
Configuration
Configuration ties the components together via promptfooconfig.yaml
. See configuration guide for details.
The configuration defines:
- Target endpoints and authentication
- Enabled plugins and their settings
- Active strategies
- Application context and policies
Component Flow
- Configuration initializes plugins and strategies
- Test engine generates probes using enabled components
- Target interface delivers probes to the system
- Evaluation engine analyzes responses and reports findings
Components can be used independently or composed into larger test suites. The modular design allows for extending functionality by adding new plugins, strategies, targets or evaluators.
For CI/CD integration, see our automation guide.