Embodied AI Policy Jailbreak

Description: LLM-based planning modules in embodied AI systems are vulnerable to Policy Executable (POEX) jailbreak attacks. Attackers can inject carefully crafted adversarial suffixes into user instructions, causing the LLM to generate and execute harmful policies in both simulated and real-world environments. The attacks bypass safety mechanisms by using optimized, human-readable suffixes that evade perplexity-based detection.

Examples: See the POEX GitHub repository: https://poex-eai-jailbreak.github.io/ (The paper provides specific examples of harmful instructions and optimized suffixes used to successfully attack various LLM models.)

Impact: Successful exploitation of this vulnerability can lead to physical harm to humans and damage to the environment. Embodied AI systems, including robotic arms, could perform actions such as breaking objects or causing injury. The transferability of adversarial suffixes across different LLM models increases the risk and impact.

Affected Systems: Embodied AI systems employing LLM-based planning modules, specifically those using open-source and proprietary LLMs that are susceptible to the attack techniques described in the research.

Mitigation Steps:

Implement safety-constrained prompts that explicitly prohibit harmful actions.
Conduct pre-planning checks on user instructions to detect malicious intent.
Perform post-planning checks on generated policies to identify potentially harmful actions before execution.
Develop and deploy robust detection mechanisms capable of identifying adversarial suffixes and mitigating their impact. The research suggests that context-aware models may provide enhanced detection capabilities.
Regularly update and retest LLM-based planning modules to incorporate improvements in robustness and safety mechanisms.

Embodied AI Policy Jailbreak

Research Paper