o4-mini Security Report
April 2025 • Model Security & Safety Evaluation
Compare Model Performance
Want to see how o4-mini stacks up against other models? Use our comparison tool to analyze security metrics side by side.
OpenAI's O4-mini model launched in October 2023, marking a significant advancement in the o-series with its optimization for fast, effective reasoning and efficient performance in coding and visual tasks.
As O4-mini gains traction among enterprises, this security analysis aims to evaluate its safety features and identify areas for improvement, ensuring secure deployment in diverse applications.
"O4-mini is our latest small o-series model, optimized for fast, effective reasoning with exceptionally efficient performance in coding and visual tasks."— OpenAI
About This Evaluation
Core Capabilities
text, image
text
200,000 tokens input • 100,000 tokens output
Use Cases
reasoning, coding, visual tasks
• enterprise AI solutions • efficient coding tasks • visual data processing
Availability
• OpenAI platforms
June 1, 2024
Security Test Results
Comprehensive analysis across 39 test categories, sorted by severity and pass rate:
Performance Overview
Our security testing suite demonstrates varying levels of compliance across different categories. The o4-mini model's overall pass rate distribution shows 3 critical, 5 high, 15 medium, and 16 low severity findings.
Top performing areas include Drug-Related Content (95.56%), Child Exploitation (93.33%), Dangerous Activity Content (93.33%). These categories demonstrate exceptional security practices and robust implementation.
Areas requiring attention include Pliny Prompt Injections (0%), Overreliance (46.67%), Entity Impersonation (46.67%). These categories present opportunities for improvement in security measures.
Our evaluation included over 1,600 test probes across four security categories. The analysis revealed both strengths and areas requiring additional safety measures:
Security & Access Control
69% Pass RateProtection against unauthorized access, data exposure, and system vulnerabilities
Compliance & Legal
84% Pass RateAssessment of compliance with legal requirements and prevention of illegal content
Trust & Safety
81% Pass RatePrevention of harmful content and protection of user safety
Brand
57% Pass RateProtection of brand integrity and prevention of misuse
OWASP Top 10 for LLMs (2025) and MITRE ATLAS represent the primary security assessment frameworks for large language models. OWASP focuses on critical deployment risks, while ATLAS maps adversarial techniques against AI systems, similar to the MITRE ATT&CK framework for traditional systems.
The o4-mini model's evaluation against these frameworks reveals several areas requiring attention across 6 identified findings. While o4-mini has no critical vulnerabilities, high-severity issues should be prioritized for improvement.
High-severity concerns include LLM02: Sensitive Information Disclosure.
MITRE ATLAS identified Jailbreak as high-severity concerns.
OWASP Top 10 for LLMs 2025 - o4-mini Assessment
High Risk Areas
Moderate Risk Areas
MITRE ATLAS Findings for o4-mini
High Severity Findings
Moderate Severity Findings
Total Test Categories
39
Average Pass Rate
75.7%
Critical Issues
3
Severity Distribution
Performance Summary
Security & Access Control
Compliance & Legal
Trust & Safety
Brand
Worst Performing Tests
Pliny Prompt Injections
Tests handling of Pliny prompt injections
Entity Impersonation
Tests for unauthorized impersonation of entities
Overreliance
Tests for overreliance on system assumptions
False Information (Hallucination)
Tests for fabrication of false or misleading information
Political Bias
Tests handling of political content and bias
Best Performing Tests
Drug-Related Content
Tests handling of illegal drug-related content
Violent Crime Content
Tests handling of violent crime content
Dangerous Activity Content
Tests handling of dangerous activity instructions
Child Exploitation
Tests handling of child exploitation content
Weapons Content
Tests handling of weapons-related content
Key Observations
Strengths
- Strong performance in handling Drug-Related Content (95.56% pass rate)
- Effective prevention in Violent Crime Content (93.33%)
- Consistent performance across critical security tests
Areas for Improvement
- Low pass rate (0%) for Pliny Prompt Injections
- 3 critical severity issues identified
- Average pass rate of 75.7% indicates room for improvement