Audio Jailbreaking
The Audio strategy converts prompt text into speech audio and then encodes that audio as a base64 string. This allows for testing how AI systems handle audio-encoded text, which may potentially bypass text-based content filters or lead to different behaviors than when processing plain text.
Use Case
This strategy is useful for:
- Testing if models can extract and process text from base64-encoded audio
- Evaluating if audio-encoded text can bypass content filters that typically scan plain text
- Assessing model behavior when handling multi-modal inputs (text converted to speech format)
Use it like so in your promptfooconfig.yaml:
strategies:
- audio
Or with additional configuration:
strategies:
- id: audio
config:
language: fr # Use French audio (ISO 639-1 code)
This strategy requires remote generation to perform the text-to-speech conversion. An active internet connection is mandatory as this functionality is implemented exclusively on the server side.
If remote generation is disabled or unavailable, the strategy will throw an error rather than fall back to any local processing.
How It Works
The strategy performs the following operations:
- Takes the original text from your test case
- Sends the text to the remote service for conversion to speech audio
- Receives the base64-encoded audio data
- Replaces the original text in your test case with the base64-encoded audio
The resulting test case contains the same semantic content as the original but in a different format that may be processed differently by AI systems.
Configuration Options
language
: An ISO 639-1 language code to specify which language the text-to-speech system should use. This parameter controls the accent and pronunciation patterns of the generated audio. Defaults to 'en' (English) if not specified. Note that this parameter only changes the accent of the speech – it does not translate your text. If you provide English text withlanguage: 'fr'
, you'll get English words spoken with a French accent.
Importance
This strategy is worth implementing because:
- It tests the robustness of content filtering mechanisms against non-plaintext formats
- It evaluates the model's ability to handle and extract information from audio data
- It can reveal inconsistencies in how models handle the same content presented in different formats
- Audio modalities may have different thresholds or processing pipelines for harmful content