Quickstart
This guide describes how to get started with Promptfoo's gen AI red teaming tool.
Prerequisites
- Install Node 18 or later
- Optional but recommended: Set the
OPENAI_API_KEY
environment variable or override the provider with your preferred service.
Initialize the project
- npx
- npm
- brew
npx promptfoo@latest redteam init my-project
cd my-project
Install:
npm install -g promptfoo
Run:
promptfoo redteam init my-project
cd my-project
Install:
brew install promptfoo
Run:
promptfoo redteam init my-project
cd my-project
The init
command creates some placeholders, including a promptfooconfig.yaml
file. We'll use this config file to do most of our setup.
Set the prompt & models to test
Edit the config to set up the prompt(s) and the LLM(s) you want to test:
prompts:
- 'Act as a travel agent and help the user plan their trip. User query: {{query}}'
providers:
- openai:gpt-4o-mini
- anthropic:messages:claude-3.5-sonnet-20240620
Talking to your app
Promptfoo can hook directly into your existing LLM app (Python, Javascript, etc), RAG or agent workflows, or send requests to your API. See custom providers for details on setting up:
- HTTP requests to your API
- Custom Python scripts for precise control
- Javascript, any executable, local providers like ollama, or other provider types
Prompting
Your prompt may be dynamic, or maybe it's constructed entirely on the application side and you just want to pass through the adversarial input. Also note that files are accepted:
prompts:
- file://path/to/prompt.json
Learn more about prompt formats.
Generate adversarial test cases
The init
step will do this for you automatically, but in case you'd like to manually re-generate your adversarial inputs:
- npx
- npm
- brew
npx promptfoo@latest redteam generate
promptfoo redteam generate
promptfoo redteam generate
This will generate several hundred adversarial inputs across many categories of potential harm and save them in redteam.yaml
.
Changing the provider (optional)
By default we use OpenAI's gpt-4o
model to generate the adversarial inputs, but we support hundreds of other models. Learn more about setting the provider.
Run the eval
Now that we've generated the test cases, we're ready to run the adversarial evaluation.
- npx
- npm
- brew
npx promptfoo@latest eval -c redteam.yaml
promptfoo eval -c redteam.yaml
promptfoo eval -c redteam.yaml
View the results
- npx
- npm
- brew
npx promptfoo@latest view
promptfoo view
promptfoo view
Promptfoo provides a detailed eval view that lets you dig into specific red team failure cases:
Click "Vulnerability Report" in the top right corner to see the report view:
That view includes a breakdown of specific test types that are connected to the eval view:
Understanding the report view
The red teaming results provide insights into various aspects of your LLM application's behavior:
- Vulnerability categories: Identifies the types of vulnerabilities discovered, such as prompt injections, context poisoning, or unintended behaviors.
- Severity levels: Classifies vulnerabilities based on their potential impact and likelihood of occurrence.
- Logs: Provides concrete instances of inputs that triggered vulnerabilities.
- Suggested mitigations: Recommendations for addressing identified vulnerabilities, which may include prompt engineering, additional safeguards, or architectural changes.
Continuous improvement
Red teaming is not a one-time activity but an ongoing process. As you develop and refine your LLM application, regularly running red team evaluations helps ensure that:
- New features or changes don't introduce unexpected vulnerabilities
- Your application remains robust against evolving attack techniques
- You can quantify and demonstrate improvements in safety and reliability over time
Check out the CI/CD integration docs for more info.
Resources
- Configuration guide for detailed info on configuring your red team
- Full guide for info examples of dynamically generated prompts, RAG/chain, etc.
- Types of LLM vulnerabilities for an overview of supported plugins
- Guides on red teaming agents and RAGs