Skip to main content

Architecture

Promptfoo automated red teaming consists of three main components: plugins, strategies, and targets.

Each component is designed to be modular and reusable. We're building a framework that is useful out of the box with minimal configuration, but can be extended with custom components.

For usage details, see the quickstart guide.

Core Components

Test Generation Engine

The test generation engine combines plugins and strategies to create attack probes:

  • Plugins generate adversarial inputs for specific vulnerability types. Each plugin is a self-contained module that can be enabled or disabled through configuration.

    Examples include PII exposure, BOLA, and Hate Speech.

  • Strategies are patterns for delivering the generated adversarial inputs.

    Some are very simple, like encoding in base64 or leetspeak. Others are more complex, and implement research from sources like Microsoft's multi-turn attacks and Meta's GOAT framework.

  • Attack Probes are the natural language prompts generated by combining plugins and strategies.

    They contain the actual test inputs along with metadata about the intended vulnerability test. Promptfoo sends these to your target system.

Target Interface

The target interface defines how test probes interact with the system under test. We support over 30 target types, including:

  • HTTP API - Tests REST endpoints via configurable requests
  • Direct Model - Interfaces with LLM providers like OpenAI or local models
  • Browser - Runs end-to-end tests using Selenium or Puppeteer
  • Custom Provider - Implements custom runtime integrations via Python/JavaScript

Each target type implements a common interface for sending probes and receiving responses.

Evaluation Engine

The evaluation engine processes target responses through:

  • Vulnerability Analysis - Scans responses for security issues using configurable detectors
  • Response Analysis - Examines output content and behavior patterns using LLM-as-a-judge grading
  • Results - Generates findings with:
    • Vulnerability type
    • Severity
    • Attack vector
    • Mitigation steps

Configuration

Configuration ties the components together via promptfooconfig.yaml. See configuration guide for details.

The configuration defines:

Component Flow

  1. Configuration initializes plugins and strategies
  2. Test engine generates probes using enabled components
  3. Target interface delivers probes to the system
  4. Evaluation engine analyzes responses and reports findings

Components can be used independently or composed into larger test suites. The modular design allows for extending functionality by adding new plugins, strategies, targets or evaluators.

For CI/CD integration, see our automation guide.