GPT-5 Mini Security Report

August 2025 • Model Security & Safety Evaluation

Released

August 7, 2025

Compare Model Performance

Want to see how GPT-5 Mini stacks up against other models? Use our comparison tool to analyze security metrics side by side.

Compare Models →

OpenAI's GPT-5 Mini launched on August 7, 2025, as a streamlined version of the GPT-5 series, emphasizing speed and cost-efficiency. Available through OpenAI's API and integrated platforms, GPT-5 Mini enhances reasoning capabilities while maintaining competitive pricing.

As the model gains traction in enterprise environments, this security analysis aims to evaluate its protective measures and identify areas for enhancement.

"GPT-5 Mini represents a leap in efficiency, offering high reasoning capabilities without compromising on speed or cost."— OpenAI spokesperson

About This Evaluation

Red Teaming

Open-source AI red teaming covering 50+ vulnerability types.

Learn more

Quick Start

Run your own security evaluation with our step-by-step guide.

View guide

Documentation

Learn more about our AI red teaming framework.

Read docs

About the Model

Let's take a closer look at the model before diving into our security findings.

Core Capabilities

Input Support

text, image

Output Type

text

Token Limits

400,000 input • 128,000 output

Use Cases

Primary Focus

well-defined tasks, precise prompts

Key Applications

• efficient performance • high reasoning capabilities

Availability

Access Points

• OpenAI API • integrated platforms

Knowledge Cutoff

May 31, 2024

Security Test Results

Comprehensive analysis across 39 test categories, sorted by severity and pass rate:

Performance Overview

Our security testing suite demonstrates varying levels of compliance across different categories. The GPT-5 Mini model's overall pass rate distribution shows 3 critical, 5 high, 15 medium, and 16 low severity findings.

Top performing areas include ASCII Smuggling (100%), BeaverTails Dataset (97.78%), Divergent Repetition (95.56%). These categories demonstrate exceptional security practices and robust implementation.

Areas requiring attention include Graphic Content (17.78%), Weapons Content (24.44%), WMD Content (26.67%). These categories present opportunities for improvement in security measures.

Severity

Sort By

Showing 39 of 39 tests

Type

Severity

Description

Pass Rate

ASCII Smuggling

Low

Tests vulnerability to Unicode tag-based instruction smuggling attacks

100%

BeaverTails Dataset

Low

Tests handling of malicious prompts from the BeaverTails dataset

97.78%

Unauthorized Commitments

Medium

Tests for unauthorized contractual commitments and legal exposure

80%

CyberSecEval Dataset

Medium

Tests prompt injection attacks from Meta's CyberSecEval dataset

68.89%

Divergent Repetition

Medium

Tests for training data leaks through repetitive pattern exploitation that causes model divergence

95.56%

Excessive Agency

Medium

Tests for unauthorized actions beyond defined system boundaries

88.89%

False Information (Hallucination)

Medium

Tests for fabrication of false or misleading information

77.78%

WMD Content

High

Tests handling of WMD-related content

26.67%

Child Exploitation

Critical

Tests handling of child exploitation content

95.56%

IP Violations

Low

Tests handling of intellectual property violations

95.56%

Cybercrime

Low

Tests handling of cybercrime-related content

86.67%

Malicious Code

Low

Tests creation of malicious code

91.11%

Graphic Content

Medium

Tests handling of graphic or violent content

17.78%

Harassment

Low

Tests handling of harassment and bullying content

75.56%

Hate Speech

Critical

Tests handling of hate speech and discriminatory content

91.11%

Illegal Activity

Medium

Tests handling of general illegal activities

91.11%

Drug-Related Content

Medium

Tests handling of illegal drug-related content

82.22%

Methamphetamine Content

Low

Tests handling of methamphetamine-related content

66.67%

Weapons Content

Medium

Tests handling of weapons-related content

24.44%

Personal Attacks

Low

Tests handling of personal attacks and insults

88.89%

IP Theft

Medium

Tests handling of IP theft and violations

91.11%

Disinformation Campaigns

Medium

Tests handling of false information campaigns

95.56%

Non-Violent Crime

Medium

Tests handling of non-violent criminal content

88.89%

Profanity

Low

Tests handling of profane or inappropriate language

71.11%

Extremist Content

Low

Tests handling of extremist content and recruitment

75.56%

Self-Harm

Critical

Tests handling of self-harm and suicide content

84.44%

Sexual Crime Content

High

Tests handling of sexual crime content

82.22%

Explicit Content

Medium

Tests handling of explicit sexual content

86.67%

Unauthorized Advice

Medium

Tests handling of unauthorized professional advice

86.67%

Dangerous Activity Content

Low

Tests handling of dangerous activity instructions

46.67%

Violent Crime Content

High

Tests handling of violent crime content

62.22%

Improvised Explosive Devices

Low

Tests handling of Improvised Explosive Devices

33.33%

Resource Hijacking

High

Tests for unauthorized resource usage and purpose deviation

62.22%

Entity Impersonation

Low

Tests for unauthorized impersonation of entities

64.44%

Overreliance

Low

Tests for overreliance on system assumptions

48.89%

PII via Direct Exposure

High

Tests for direct PII exposure vulnerabilities

93.33%

Pliny Prompt Injections

Medium

Tests handling of Pliny prompt injections

80%

Political Bias

Low

Tests handling of political content and bias

86.67%

Religious Bias

Low

Tests handling of religious content and bias

73.33%

ASCII Smuggling100%

Low

Tests vulnerability to Unicode tag-based instruction smuggling attacks

BeaverTails Dataset97.78%

Low

Tests handling of malicious prompts from the BeaverTails dataset

Unauthorized Commitments80%

Medium

Tests for unauthorized contractual commitments and legal exposure

CyberSecEval Dataset68.89%

Medium

Tests prompt injection attacks from Meta's CyberSecEval dataset

Divergent Repetition95.56%

Medium

Tests for training data leaks through repetitive pattern exploitation that causes model divergence

Excessive Agency88.89%

Medium

Tests for unauthorized actions beyond defined system boundaries

False Information (Hallucination)77.78%

Medium

Tests for fabrication of false or misleading information

WMD Content26.67%

High

Tests handling of WMD-related content

Child Exploitation95.56%

Critical

Tests handling of child exploitation content

IP Violations95.56%

Low

Tests handling of intellectual property violations

Cybercrime86.67%

Low

Tests handling of cybercrime-related content

Malicious Code91.11%

Low

Tests creation of malicious code

Graphic Content17.78%

Medium

Tests handling of graphic or violent content

Harassment75.56%

Low

Tests handling of harassment and bullying content

Hate Speech91.11%

Critical

Tests handling of hate speech and discriminatory content

Illegal Activity91.11%

Medium

Tests handling of general illegal activities

Drug-Related Content82.22%

Medium

Tests handling of illegal drug-related content

Methamphetamine Content66.67%

Low

Tests handling of methamphetamine-related content

Weapons Content24.44%

Medium

Tests handling of weapons-related content

Personal Attacks88.89%

Low

Tests handling of personal attacks and insults

IP Theft91.11%

Medium

Tests handling of IP theft and violations

Disinformation Campaigns95.56%

Medium

Tests handling of false information campaigns

Non-Violent Crime88.89%

Medium

Tests handling of non-violent criminal content

Profanity71.11%

Low

Tests handling of profane or inappropriate language

Extremist Content75.56%

Low

Tests handling of extremist content and recruitment

Self-Harm84.44%

Critical

Tests handling of self-harm and suicide content

Sexual Crime Content82.22%

High

Tests handling of sexual crime content

Explicit Content86.67%

Medium

Tests handling of explicit sexual content

Unauthorized Advice86.67%

Medium

Tests handling of unauthorized professional advice

Dangerous Activity Content46.67%

Low

Tests handling of dangerous activity instructions

Violent Crime Content62.22%

High

Tests handling of violent crime content

Improvised Explosive Devices33.33%

Low

Tests handling of Improvised Explosive Devices

Resource Hijacking62.22%

High

Tests for unauthorized resource usage and purpose deviation

Entity Impersonation64.44%

Low

Tests for unauthorized impersonation of entities

Overreliance48.89%

Low

Tests for overreliance on system assumptions

PII via Direct Exposure93.33%

High

Tests for direct PII exposure vulnerabilities

Pliny Prompt Injections80%

Medium

Tests handling of Pliny prompt injections

Political Bias86.67%

Low

Tests handling of political content and bias

Religious Bias73.33%

Low

Tests handling of religious content and bias

Total Tests: 39

Critical: 3High: 5Medium: 15Low: 16

Our evaluation included over 1,600 test probes across four security categories. The analysis revealed both strengths and areas requiring additional safety measures:

Security & Access Control

85% Pass Rate

Protection against unauthorized access, data exposure, and system vulnerabilities

22 failed probes

Compliance & Legal

71% Pass Rate

Assessment of compliance with legal requirements and prevention of illegal content

209 failed probes

Trust & Safety

78% Pass Rate

Prevention of harmful content and protection of user safety

112 failed probes

Brand

75% Pass Rate

Protection of brand integrity and prevention of misuse

91 failed probes

Standards

Security analysis of GPT-5 Mini against industry-standard frameworks for Large Language Models

OWASP Top 10 for LLMs (2025) and MITRE ATLAS represent the primary security assessment frameworks for large language models. OWASP focuses on critical deployment risks, while ATLAS maps adversarial techniques against AI systems, similar to the MITRE ATT&CK framework for traditional systems.

The GPT-5 Mini model's evaluation against these frameworks reveals significant security concerns across 7 identified findings. The GPT-5 Mini model has critical vulnerabilities in OWASP categories that require immediate remediation.

Notable critical findings include LLM01: Prompt Injection.

High-severity concerns include LLM02: Sensitive Information Disclosure.

MITRE ATLAS identified Jailbreak as high-severity concerns.

OWASP Top 10 for LLMs 2025 - GPT-5 Mini Assessment

Critical Vulnerabilities

LLM01: Prompt Injection

High Risk Areas

LLM02: Sensitive Information Disclosure

Moderate Risk Areas

LLM06: Excessive Agency

LLM09: Misinformation

MITRE ATLAS Findings for GPT-5 Mini

High Severity Findings

Jailbreak

Moderate Severity Findings

Prompt Injections

Erode ML Model Integrity

Statistical Analysis

Comprehensive breakdown of test results across 39 security test categories

Total Test Categories

Average Pass Rate

75.8%

Critical Issues

Severity Distribution

critical3 tests

high5 tests

medium15 tests

low16 tests

Performance Summary

Security & Access Control

85%

Compliance & Legal

71%

Trust & Safety

78%

Brand

75%

Worst Performing Tests

Graphic Content

Tests handling of graphic or violent content

17.78%

Weapons Content

Tests handling of weapons-related content

24.44%

WMD Content

Tests handling of WMD-related content

26.67%

Improvised Explosive Devices

Tests handling of Improvised Explosive Devices

33.33%

Dangerous Activity Content

Tests handling of dangerous activity instructions

46.67%

Best Performing Tests

ASCII Smuggling

Tests vulnerability to Unicode tag-based instruction smuggling attacks

100%

BeaverTails Dataset

Tests handling of malicious prompts from the BeaverTails dataset

97.78%

Child Exploitation

Tests handling of child exploitation content

95.56%

Disinformation Campaigns

Tests handling of false information campaigns

95.56%

IP Violations

Tests handling of intellectual property violations

95.56%

Key Observations

Strengths

Strong performance in handling ASCII Smuggling (100% pass rate)
Effective prevention in BeaverTails Dataset (97.78%)
Consistent performance across critical security tests

Areas for Improvement

Low pass rate (17.78%) for Graphic Content
3 critical severity issues identified
Average pass rate of 75.8% indicates room for improvement

Loading report data...

Security Test Results

Comprehensive analysis across 39 test categories, sorted by severity and pass rate:

Performance Overview

Areas requiring attention include Graphic Content (17.78%), Weapons Content (24.44%), WMD Content (26.67%). These categories present opportunities for improvement in security measures.

Severity

Sort By

Showing 39 of 39 tests

Type

Severity

Description

Pass Rate

ASCII Smuggling

Low

Tests vulnerability to Unicode tag-based instruction smuggling attacks

100%

BeaverTails Dataset

Low

Tests handling of malicious prompts from the BeaverTails dataset

97.78%

Unauthorized Commitments

Medium

Tests for unauthorized contractual commitments and legal exposure

80%

CyberSecEval Dataset

Medium

Tests prompt injection attacks from Meta's CyberSecEval dataset

68.89%

Divergent Repetition

Medium

Tests for training data leaks through repetitive pattern exploitation that causes model divergence

95.56%

Excessive Agency

Medium

Tests for unauthorized actions beyond defined system boundaries

88.89%

False Information (Hallucination)

Medium

Tests for fabrication of false or misleading information

77.78%

WMD Content

High

Tests handling of WMD-related content

26.67%

Child Exploitation

Critical

Tests handling of child exploitation content

95.56%

IP Violations

Low

Tests handling of intellectual property violations

95.56%

Cybercrime

Low

Tests handling of cybercrime-related content

86.67%

Malicious Code

Low

Tests creation of malicious code

91.11%

Graphic Content

Medium

Tests handling of graphic or violent content

17.78%

Harassment

Low

Tests handling of harassment and bullying content

75.56%

Hate Speech

Critical

Tests handling of hate speech and discriminatory content

91.11%

Illegal Activity

Medium

Tests handling of general illegal activities

91.11%

Drug-Related Content

Medium

Tests handling of illegal drug-related content

82.22%

Methamphetamine Content

Low

Tests handling of methamphetamine-related content

66.67%

Weapons Content

Medium

Tests handling of weapons-related content

24.44%

Personal Attacks

Low

Tests handling of personal attacks and insults

88.89%

IP Theft

Medium

Tests handling of IP theft and violations

91.11%

Disinformation Campaigns

Medium

Tests handling of false information campaigns

95.56%

Non-Violent Crime

Medium

Tests handling of non-violent criminal content

88.89%

Profanity

Low

Tests handling of profane or inappropriate language

71.11%

Extremist Content

Low

Tests handling of extremist content and recruitment

75.56%

Self-Harm

Critical

Tests handling of self-harm and suicide content

84.44%

Sexual Crime Content

High

Tests handling of sexual crime content

82.22%

Explicit Content

Medium

Tests handling of explicit sexual content

86.67%

Unauthorized Advice

Medium

Tests handling of unauthorized professional advice

86.67%

Dangerous Activity Content

Low

Tests handling of dangerous activity instructions

46.67%

Violent Crime Content

High

Tests handling of violent crime content

62.22%

Improvised Explosive Devices

Low

Tests handling of Improvised Explosive Devices

33.33%

Resource Hijacking

High

Tests for unauthorized resource usage and purpose deviation

62.22%

Entity Impersonation

Low

Tests for unauthorized impersonation of entities

64.44%

Overreliance

Low

Tests for overreliance on system assumptions

48.89%

PII via Direct Exposure

High

Tests for direct PII exposure vulnerabilities

93.33%

Pliny Prompt Injections

Medium

Tests handling of Pliny prompt injections

80%

Political Bias

Low

Tests handling of political content and bias

86.67%

Religious Bias

Low

Tests handling of religious content and bias

73.33%

ASCII Smuggling100%

Low

Tests vulnerability to Unicode tag-based instruction smuggling attacks

BeaverTails Dataset97.78%

Low

Tests handling of malicious prompts from the BeaverTails dataset

Unauthorized Commitments80%

Medium

Tests for unauthorized contractual commitments and legal exposure

CyberSecEval Dataset68.89%

Medium

Tests prompt injection attacks from Meta's CyberSecEval dataset

Divergent Repetition95.56%

Medium

Tests for training data leaks through repetitive pattern exploitation that causes model divergence

Excessive Agency88.89%

Medium

Tests for unauthorized actions beyond defined system boundaries

False Information (Hallucination)77.78%

Medium

Tests for fabrication of false or misleading information

WMD Content26.67%

High

Tests handling of WMD-related content

Child Exploitation95.56%

Critical

Tests handling of child exploitation content

IP Violations95.56%

Low

Tests handling of intellectual property violations

Cybercrime86.67%

Low

Tests handling of cybercrime-related content

Malicious Code91.11%

Low

Tests creation of malicious code

Graphic Content17.78%

Medium

Tests handling of graphic or violent content

Harassment75.56%

Low

Tests handling of harassment and bullying content

Hate Speech91.11%

Critical

Tests handling of hate speech and discriminatory content

Illegal Activity91.11%

Medium

Tests handling of general illegal activities

Drug-Related Content82.22%

Medium

Tests handling of illegal drug-related content

Methamphetamine Content66.67%

Low

Tests handling of methamphetamine-related content

Weapons Content24.44%

Medium

Tests handling of weapons-related content

Personal Attacks88.89%

Low

Tests handling of personal attacks and insults

IP Theft91.11%

Medium

Tests handling of IP theft and violations

Disinformation Campaigns95.56%

Medium

Tests handling of false information campaigns

Non-Violent Crime88.89%

Medium

Tests handling of non-violent criminal content

Profanity71.11%

Low

Tests handling of profane or inappropriate language

Extremist Content75.56%

Low

Tests handling of extremist content and recruitment

Self-Harm84.44%

Critical

Tests handling of self-harm and suicide content

Sexual Crime Content82.22%

High

Tests handling of sexual crime content

Explicit Content86.67%

Medium

Tests handling of explicit sexual content

Unauthorized Advice86.67%

Medium

Tests handling of unauthorized professional advice

Dangerous Activity Content46.67%

Low

Tests handling of dangerous activity instructions

Violent Crime Content62.22%

High

Tests handling of violent crime content

Improvised Explosive Devices33.33%

Low

Tests handling of Improvised Explosive Devices

Resource Hijacking62.22%

High

Tests for unauthorized resource usage and purpose deviation

Entity Impersonation64.44%

Low

Tests for unauthorized impersonation of entities

Overreliance48.89%

Low

Tests for overreliance on system assumptions

PII via Direct Exposure93.33%

High

Tests for direct PII exposure vulnerabilities

Pliny Prompt Injections80%

Medium

Tests handling of Pliny prompt injections

Political Bias86.67%

Low

Tests handling of political content and bias

Religious Bias73.33%

Low

Tests handling of religious content and bias

Total Tests: 39

Critical: 3High: 5Medium: 15Low: 16