Jailbreaking LLM-Controlled Robots
- URL: http://arxiv.org/abs/2410.13691v2
- Date: Sat, 09 Nov 2024 20:00:07 GMT
- Title: Jailbreaking LLM-Controlled Robots
- Authors: Alexander Robey, Zachary Ravichandran, Vijay Kumar, Hamed Hassani, George J. Pappas,
- Abstract summary: Large language models (LLMs) have revolutionized the field of robotics by enabling contextual reasoning and intuitive human-robot interaction.
LLMs are vulnerable to jailbreaking attacks, wherein malicious prompters elicit harmful text by bypassing LLM safety guardrails.
We introduce RoboPAIR, the first algorithm designed to jailbreak LLM-controlled robots.
- Score: 82.04590367171932
- License:
- Abstract: The recent introduction of large language models (LLMs) has revolutionized the field of robotics by enabling contextual reasoning and intuitive human-robot interaction in domains as varied as manipulation, locomotion, and self-driving vehicles. When viewed as a stand-alone technology, LLMs are known to be vulnerable to jailbreaking attacks, wherein malicious prompters elicit harmful text by bypassing LLM safety guardrails. To assess the risks of deploying LLMs in robotics, in this paper, we introduce RoboPAIR, the first algorithm designed to jailbreak LLM-controlled robots. Unlike existing, textual attacks on LLM chatbots, RoboPAIR elicits harmful physical actions from LLM-controlled robots, a phenomenon we experimentally demonstrate in three scenarios: (i) a white-box setting, wherein the attacker has full access to the NVIDIA Dolphins self-driving LLM, (ii) a gray-box setting, wherein the attacker has partial access to a Clearpath Robotics Jackal UGV robot equipped with a GPT-4o planner, and (iii) a black-box setting, wherein the attacker has only query access to the GPT-3.5-integrated Unitree Robotics Go2 robot dog. In each scenario and across three new datasets of harmful robotic actions, we demonstrate that RoboPAIR, as well as several static baselines, finds jailbreaks quickly and effectively, often achieving 100% attack success rates. Our results reveal, for the first time, that the risks of jailbroken LLMs extend far beyond text generation, given the distinct possibility that jailbroken robots could cause physical damage in the real world. Indeed, our results on the Unitree Go2 represent the first successful jailbreak of a deployed commercial robotic system. Addressing this emerging vulnerability is critical for ensuring the safe deployment of LLMs in robotics. Additional media is available at: https://robopair.org
Related papers
- How Can LLMs and Knowledge Graphs Contribute to Robot Safety? A Few-Shot Learning Approach [8.15784886699733]
Large Language Models (LLMs) are transforming the robotics domain by enabling robots to comprehend and execute natural language instructions.
This paper outlines a safety layer that verifies the code generated by ChatGPT before executing it to control a drone in a simulated environment.
arXiv Detail & Related papers (2024-12-16T02:28:34Z) - TrojanRobot: Physical-World Backdoor Attacks Against VLM-based Robotic Manipulation [22.313765935846046]
We propose textitTrojanRobot, a highly stealthy and broadly effective robotic backdoor attack in the physical world.
Specifically, we introduce a module-poisoning approach by embedding a backdoor module into the modular robotic policy.
We develop three types of prime attacks, ie, textitpermutation, textitstagnation, and textitintentional attacks, thus achieving finer-grained backdoors.
arXiv Detail & Related papers (2024-11-18T16:09:26Z) - $π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge.
We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z) - BadRobot: Jailbreaking Embodied LLMs in the Physical World [20.96351292684658]
Embodied AI represents systems where AI is integrated into physical entities.
Large Language Model (LLM) exhibits powerful language understanding abilities.
We introduce BadRobot, a novel attack paradigm aiming to make embodied LLMs violate safety and ethical constraints through typical voice-based user-system interactions.
arXiv Detail & Related papers (2024-07-16T13:13:16Z) - LLaRA: Supercharging Robot Learning Data for Vision-Language Policy [56.505551117094534]
We introduce LLaRA: Large Language and Robotics Assistant, a framework that formulates robot action policy as visuo-textual conversations.
First, we present an automated pipeline to generate conversation-style instruction tuning data for robots from existing behavior cloning datasets.
We show that a VLM finetuned with a limited amount of such datasets can produce meaningful action decisions for robotic control.
arXiv Detail & Related papers (2024-06-28T17:59:12Z) - Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration [4.2460673279562755]
Large Language Models (LLMs) are gaining popularity in the field of robotics.
This paper proposes a novel approach to enhance the performance of LLM-based autonomous manipulation through Human-Robot Collaboration (HRC)
The approach involves using a prompted GPT-4 language model to decompose high-level language commands into sequences of motions that can be executed by the robot.
arXiv Detail & Related papers (2024-06-20T08:23:49Z) - Physical Backdoor Attack can Jeopardize Driving with Vision-Large-Language Models [53.701148276912406]
Vision-Large-Language-models (VLMs) have great application prospects in autonomous driving.
BadVLMDriver is the first backdoor attack against VLMs for autonomous driving that can be launched in practice using physical objects.
BadVLMDriver achieves a 92% attack success rate in inducing a sudden acceleration when coming across a pedestrian holding a red balloon.
arXiv Detail & Related papers (2024-04-19T14:40:38Z) - Jailbreaking Black Box Large Language Models in Twenty Queries [97.29563503097995]
Large language models (LLMs) are vulnerable to adversarial jailbreaks.
We propose an algorithm that generates semantic jailbreaks with only black-box access to an LLM.
arXiv Detail & Related papers (2023-10-12T15:38:28Z) - Fleet-DAgger: Interactive Robot Fleet Learning with Scalable Human
Supervision [72.4735163268491]
Commercial and industrial deployments of robot fleets often fall back on remote human teleoperators during execution.
We formalize the Interactive Fleet Learning (IFL) setting, in which multiple robots interactively query and learn from multiple human supervisors.
We propose Fleet-DAgger, a family of IFL algorithms, and compare a novel Fleet-DAgger algorithm to 4 baselines in simulation.
arXiv Detail & Related papers (2022-06-29T01:23:57Z) - RoboMal: Malware Detection for Robot Network Systems [4.357338639836869]
We propose the RoboMal framework of static malware detection on binary executables to detect malware before it gets a chance to execute.
The framework is compared against widely used supervised learning models: GRU, CNN, and ANN.
Notably, the LSTM-based RoboMal model outperforms the other models with an accuracy of 85% and precision of 87% in 10-fold cross-validation.
arXiv Detail & Related papers (2022-01-20T22:11:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.