Quickstart
Promptfoo is an open-source tool for red teaming gen AI applications.
- Automatically scans 30+ vulnerability types:
- Security & data privacy: jailbreaks, injections, RAG poisoning, etc.
- Compliance & ethics: harmful & biased content, content filter validation, OWASP/NIST/EU compliance, etc.
- Custom policies: enforce organizational guidelines.
- Generates dynamic attack probes tailored to your application using specialized uncensored models.
- Implements state-of-the-art adversarial ML research from Microsoft, Meta, and others.
- Integrates with CI/CD.
- Tests via HTTP API, browser, or direct model access.
Prerequisites
- Install Node 18 or later
- Optional but recommended: Set the
OPENAI_API_KEY
environment variable or override the provider with your preferred service.
Initialize the project
- npx
- npm
- brew
npx promptfoo@latest redteam init my-project
cd my-project
Install:
npm install -g promptfoo
Run:
promptfoo redteam init my-project
cd my-project
Install:
brew install promptfoo
Run:
promptfoo redteam init my-project
cd my-project
The init
command creates some placeholders, including a promptfooconfig.yaml
file. We'll use this config file to do most of our setup.
Attacking an API endpoint
Edit the config to set up the target endpoint. For example:
targets:
- id: 'https://example.com/generate'
label: 'travel-agent-agent'
config:
method: 'POST'
headers:
'Content-Type': 'application/json'
body:
myPrompt: '{{prompt}}'
purpose: 'The user is a budget traveler looking for the best deals. The system is a travel agent that helps the user plan their trip. The user is anonymous and should not be able to access any information about other users, employees, or other individuals.'
The label
is used to create issues and report the results of the red teaming. Make sure to re-use the same label
when generating new redteam configs for the same target.
Setting the purpose
is optional, but it will significantly improve the quality of the generated test cases and grading. Be specific about who the user of the system is and what information and actions they should be able to access.
For more information on configuring an HTTP target, see HTTP requests.
Alternative: Test specific prompts and models
If you don't have a live endpoint, you can edit the config to set the specific prompt(s) and the LLM(s) to test:
prompts:
- 'Act as a travel agent and help the user plan their trip. User query: {{query}}'
# Paths to prompts also work:
# - file://path/to/prompt.txt
targets:
- id: openai:gpt-4o-mini
label: 'travel-agent-mini'
For more information on supported targets, see Custom Providers. For more information on supported prompt formats, see prompts.
Alternative: Talking directly to your app
Promptfoo can hook directly into your existing LLM app to attack targets via Python, Javascript, RAG or agent workflows, HTTP API, and more. See custom providers for details on setting up:
- HTTP requests to your API
- Custom Python scripts for precise control
- Javascript, any executable, local providers like ollama, or other provider types
Run the eval
Now that we've generated the test cases, we're ready to run the adversarial evaluation.
- npx
- npm
- brew
npx promptfoo@latest redteam run
promptfoo redteam run
promptfoo redteam run
This command will generate several hundred adversarial inputs across many categories of potential harm and save them in redteam.yaml
. Then, it will run the test cases against the target.
View the results
- npx
- npm
- brew
npx promptfoo@latest redteam report
promptfoo redteam report
promptfoo redteam report
Promptfoo provides a report view that lets you dig into specific red team failure cases:
That view includes a breakdown of specific test types that are connected to the eval view:
Clicking into a specific test case to view logs will display the raw inputs and outputs:
Understanding the report view
The red teaming results provide insights into various aspects of your LLM application's behavior:
- Vulnerability categories: Identifies the types of vulnerabilities discovered, such as prompt injections, context poisoning, or unintended behaviors.
- Severity levels: Classifies vulnerabilities based on their potential impact and likelihood of occurrence.
- Logs: Provides concrete instances of inputs that triggered vulnerabilities.
- Suggested mitigations: Recommendations for addressing identified vulnerabilities, which may include prompt engineering, additional safeguards, or architectural changes.
Continuous improvement
Red teaming is not a one-time activity but an ongoing process. As you develop and refine your LLM application, regularly running red team evaluations helps ensure that:
- New features or changes don't introduce unexpected vulnerabilities
- Your application remains robust against evolving attack techniques
- You can quantify and demonstrate improvements in safety and reliability over time
Check out the CI/CD integration docs for more info.
Resources
- Configuration guide for detailed info on configuring your red team
- Full guide for info examples of dynamically generated prompts, RAG/chain, etc.
- Types of LLM vulnerabilities for an overview of supported plugins
- Guides on red teaming agents and RAGs