LLM-Driven Motion Adversarial Attack
Research Paper
Autonomous LLM-Enhanced Adversarial Attack for Text-to-Motion
View PaperDescription: The ALERT-Motion framework demonstrates a vulnerability in text-to-motion (T2M) models where an attacker can craft subtly modified text prompts (adversarial prompts) that cause the model to generate motions significantly different from those intended by the benign prompt, yet semantically similar to a target motion specified by the attacker. The attack leverages a large language model (LLM) to autonomously generate these adversarial prompts, bypassing simple keyword-based detection mechanisms. The vulnerability stems from the model's insufficient robustness to semantically similar but perceptually different prompts.
Examples: See the paper's Figures 3 and 4 for examples of adversarial prompts and resulting motions generated by the ALERT-Motion framework against the MDM and MLD models. The adversarial prompts are visually similar to the benign prompts, but the generated motions significantly differ from the benign motion and closely match the attacker-specified target motion.
Impact: Successful exploitation of this vulnerability allows an attacker to manipulate the output of a T2M model to generate harmful or unintended motions. This could lead to the generation of violent or explicit content in animation, or the creation of unsafe controller commands for robots, potentially causing physical harm. The autonomous nature of ALERT-Motion makes the attack more difficult to defend against.
Affected Systems: Text-to-motion (T2M) models, including but not limited to MLD and MDM, which are susceptible to adversarial attacks based on subtle semantic variations in text prompts. Systems using these models for animation, robotics control, or other applications may be affected.
Mitigation Steps:
- Increase the diversity and size of the training datasets used for T2M models to improve robustness against adversarial attacks.
- Develop and implement more sophisticated detection mechanisms that go beyond simple keyword filtering to identify semantically similar, yet adversarial prompts.
- Investigate and apply adversarial training techniques to enhance the robustness of T2M models against this type of attack.
- Implement robust post-processing filters to detect and mitigate abnormal outputs from the T2M model, potentially using motion similarity comparison techniques.
© 2025 Promptfoo. All rights reserved.