BadRobot: Manipulating Embodied LLMs in the Physical World
- URL: http://arxiv.org/abs/2407.20242v3
- Date: Thu, 3 Oct 2024 14:31:39 GMT
- Title: BadRobot: Manipulating Embodied LLMs in the Physical World
- Authors: Hangtao Zhang, Chenyu Zhu, Xianlong Wang, Ziqi Zhou, Changgan Yin, Minghui Li, Lulu Xue, Yichen Wang, Shengshan Hu, Aishan Liu, Peijin Guo, Leo Yu Zhang,
- Abstract summary: Embodied AI represents systems where AI is integrated into physical entities, enabling them to perceive and interact with their surroundings.
Large Language Model (LLM), which exhibits powerful language understanding abilities, has been extensively employed in embodied AI.
We introduce BadRobot, a novel attack paradigm aiming to make embodied LLMs violate safety and ethical constraints through typical voice-based user-system interactions.
- Score: 20.96351292684658
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embodied AI represents systems where AI is integrated into physical entities, enabling them to perceive and interact with their surroundings. Large Language Model (LLM), which exhibits powerful language understanding abilities, has been extensively employed in embodied AI by facilitating sophisticated task planning. However, a critical safety issue remains overlooked: could these embodied LLMs perpetrate harmful behaviors? In response, we introduce BadRobot, a novel attack paradigm aiming to make embodied LLMs violate safety and ethical constraints through typical voice-based user-system interactions. Specifically, three vulnerabilities are exploited to achieve this type of attack: (i) manipulation of LLMs within robotic systems, (ii) misalignment between linguistic outputs and physical actions, and (iii) unintentional hazardous behaviors caused by world knowledge's flaws. Furthermore, we construct a benchmark of various malicious physical action queries to evaluate BadRobot's attack performance. Based on this benchmark, extensive experiments against existing prominent embodied LLM frameworks (e.g., Voxposer, Code as Policies, and ProgPrompt) demonstrate the effectiveness of our BadRobot. Warning: This paper contains harmful AI-generated language and aggressive actions.
Related papers
- Jailbreaking LLM-Controlled Robots [82.04590367171932]
Large language models (LLMs) have revolutionized the field of robotics by enabling contextual reasoning and intuitive human-robot interaction.
LLMs are vulnerable to jailbreaking attacks, wherein malicious prompters elicit harmful text by bypassing LLM safety guardrails.
We introduce RoboPAIR, the first algorithm designed to jailbreak LLM-controlled robots.
arXiv Detail & Related papers (2024-10-17T15:55:36Z) - Compromising Embodied Agents with Contextual Backdoor Attacks [69.71630408822767]
Large language models (LLMs) have transformed the development of embodied intelligence.
This paper uncovers a significant backdoor security threat within this process.
By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a black-box LLM.
arXiv Detail & Related papers (2024-08-06T01:20:12Z) - LLM-Driven Robots Risk Enacting Discrimination, Violence, and Unlawful Actions [3.1247504290622214]
Research has raised concerns about the potential for Large Language Models to produce discriminatory outcomes and unsafe behaviors in real-world robot experiments and applications.
We conduct an HRI-based evaluation of discrimination and safety criteria on several highly-rated LLMs.
Our results underscore the urgent need for systematic, routine, and comprehensive risk assessments and assurances to improve outcomes.
arXiv Detail & Related papers (2024-06-13T05:31:49Z) - Empowering Large Language Models on Robotic Manipulation with Affordance Prompting [23.318449345424725]
Large language models fail to interact with the physical world by generating control sequences properly.
Existing LLM-based approaches circumvent this problem by relying on additional pre-defined skills or pre-trained sub-policies.
We propose a framework called LLM+A(ffordance) where the LLM serves as both the sub-task planner and the motion controller.
arXiv Detail & Related papers (2024-04-17T03:06:32Z) - Sandwich attack: Multi-language Mixture Adaptive Attack on LLMs [9.254047358707014]
We introduce a new black-box attack vector called the emphSandwich attack: a multi-language mixture attack.
Our experiments with five different models, namely Google's Bard, Gemini Pro, LLaMA-2-70-B-Chat, GPT-3.5-Turbo, GPT-4, and Claude-3-OPUS, show that this attack vector can be used by adversaries to generate harmful responses.
arXiv Detail & Related papers (2024-04-09T18:29:42Z) - The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative [55.08395463562242]
Multimodal Large Language Models (MLLMs) are constantly defining the new boundary of Artificial General Intelligence (AGI)
Our paper explores a novel vulnerability in MLLM societies - the indirect propagation of malicious content.
arXiv Detail & Related papers (2024-02-20T23:08:21Z) - Highlighting the Safety Concerns of Deploying LLMs/VLMs in Robotics [54.57914943017522]
We highlight the critical issues of robustness and safety associated with integrating large language models (LLMs) and vision-language models (VLMs) into robotics applications.
arXiv Detail & Related papers (2024-02-15T22:01:45Z) - Plug in the Safety Chip: Enforcing Constraints for LLM-driven Robot
Agents [25.62431723307089]
We propose a queryable safety constraint module based on linear temporal logic (LTL)
Our system strictly adheres to the safety constraints and scales well with complex safety constraints, highlighting its potential for practical utility.
arXiv Detail & Related papers (2023-09-18T16:33:30Z) - Not what you've signed up for: Compromising Real-World LLM-Integrated
Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications.
We show how attackers can override original instructions and employed controls using Prompt Injection attacks.
We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z) - Trojaning Language Models for Fun and Profit [53.45727748224679]
TROJAN-LM is a new class of trojaning attacks in which maliciously crafted LMs trigger host NLP systems to malfunction.
By empirically studying three state-of-the-art LMs in a range of security-critical NLP tasks, we demonstrate that TROJAN-LM possesses the following properties.
arXiv Detail & Related papers (2020-08-01T18:22:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.