BadRobot: Manipulating Embodied LLMs in the Physical World
- URL: http://arxiv.org/abs/2407.20242v3
- Date: Thu, 3 Oct 2024 14:31:39 GMT
- Title: BadRobot: Manipulating Embodied LLMs in the Physical World
- Authors: Hangtao Zhang, Chenyu Zhu, Xianlong Wang, Ziqi Zhou, Changgan Yin, Minghui Li, Lulu Xue, Yichen Wang, Shengshan Hu, Aishan Liu, Peijin Guo, Leo Yu Zhang,
- Abstract summary: Embodied AI represents systems where AI is integrated into physical entities, enabling them to perceive and interact with their surroundings.
Large Language Model (LLM), which exhibits powerful language understanding abilities, has been extensively employed in embodied AI.
We introduce BadRobot, a novel attack paradigm aiming to make embodied LLMs violate safety and ethical constraints through typical voice-based user-system interactions.
- Score: 20.96351292684658
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embodied AI represents systems where AI is integrated into physical entities, enabling them to perceive and interact with their surroundings. Large Language Model (LLM), which exhibits powerful language understanding abilities, has been extensively employed in embodied AI by facilitating sophisticated task planning. However, a critical safety issue remains overlooked: could these embodied LLMs perpetrate harmful behaviors? In response, we introduce BadRobot, a novel attack paradigm aiming to make embodied LLMs violate safety and ethical constraints through typical voice-based user-system interactions. Specifically, three vulnerabilities are exploited to achieve this type of attack: (i) manipulation of LLMs within robotic systems, (ii) misalignment between linguistic outputs and physical actions, and (iii) unintentional hazardous behaviors caused by world knowledge's flaws. Furthermore, we construct a benchmark of various malicious physical action queries to evaluate BadRobot's attack performance. Based on this benchmark, extensive experiments against existing prominent embodied LLM frameworks (e.g., Voxposer, Code as Policies, and ProgPrompt) demonstrate the effectiveness of our BadRobot. Warning: This paper contains harmful AI-generated language and aggressive actions.
Related papers
- Jailbreaking LLM-Controlled Robots [82.04590367171932]
Large language models (LLMs) have revolutionized the field of robotics by enabling contextual reasoning and intuitive human-robot interaction.
LLMs are vulnerable to jailbreaking attacks, wherein malicious prompters elicit harmful text by bypassing LLM safety guardrails.
We introduce RoboPAIR, the first algorithm designed to jailbreak LLM-controlled robots.
arXiv Detail & Related papers (2024-10-17T15:55:36Z) - Compromising Embodied Agents with Contextual Backdoor Attacks [69.71630408822767]
Large language models (LLMs) have transformed the development of embodied intelligence.
This paper uncovers a significant backdoor security threat within this process.
By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a black-box LLM.
arXiv Detail & Related papers (2024-08-06T01:20:12Z) - LLM-Driven Robots Risk Enacting Discrimination, Violence, and Unlawful Actions [3.1247504290622214]
Research has raised concerns about the potential for Large Language Models to produce discriminatory outcomes and unsafe behaviors in real-world robot experiments and applications.
We conduct an HRI-based evaluation of discrimination and safety criteria on several highly-rated LLMs.
Our results underscore the urgent need for systematic, routine, and comprehensive risk assessments and assurances to improve outcomes.
arXiv Detail & Related papers (2024-06-13T05:31:49Z) - Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-based Decision-Making Systems [27.316115171846953]
Large Language Models (LLMs) have shown significant promise in real-world decision-making tasks for embodied AI.
LLMs are fine-tuned to leverage their inherent common sense and reasoning abilities while being tailored to specific applications.
This fine-tuning process introduces considerable safety and security vulnerabilities, especially in safety-critical cyber-physical systems.
arXiv Detail & Related papers (2024-05-27T17:59:43Z) - Highlighting the Safety Concerns of Deploying LLMs/VLMs in Robotics [54.57914943017522]
We highlight the critical issues of robustness and safety associated with integrating large language models (LLMs) and vision-language models (VLMs) into robotics applications.
arXiv Detail & Related papers (2024-02-15T22:01:45Z) - Introspective Planning: Aligning Robots' Uncertainty with Inherent Task Ambiguity [0.659529078336196]
Large language models (LLMs) exhibit advanced reasoning skills, enabling robots to comprehend natural language instructions and strategically plan high-level actions.
LLMs hallucination may result in robots confidently executing plans that are misaligned with user goals or even unsafe in critical scenarios.
We propose introspective planning, a systematic approach that align LLM's uncertainty with the inherent ambiguity of the task.
arXiv Detail & Related papers (2024-02-09T16:40:59Z) - Attack Prompt Generation for Red Teaming and Defending Large Language
Models [70.157691818224]
Large language models (LLMs) are susceptible to red teaming attacks, which can induce LLMs to generate harmful content.
We propose an integrated approach that combines manual and automatic methods to economically generate high-quality attack prompts.
arXiv Detail & Related papers (2023-10-19T06:15:05Z) - Plug in the Safety Chip: Enforcing Constraints for LLM-driven Robot
Agents [25.62431723307089]
We propose a queryable safety constraint module based on linear temporal logic (LTL)
Our system strictly adheres to the safety constraints and scales well with complex safety constraints, highlighting its potential for practical utility.
arXiv Detail & Related papers (2023-09-18T16:33:30Z) - Not what you've signed up for: Compromising Real-World LLM-Integrated
Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications.
We show how attackers can override original instructions and employed controls using Prompt Injection attacks.
We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z) - Trojaning Language Models for Fun and Profit [53.45727748224679]
TROJAN-LM is a new class of trojaning attacks in which maliciously crafted LMs trigger host NLP systems to malfunction.
By empirically studying three state-of-the-art LMs in a range of security-critical NLP tasks, we demonstrate that TROJAN-LM possesses the following properties.
arXiv Detail & Related papers (2020-08-01T18:22:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.