Related papers: Jailbreaking LLM-Controlled Robots

Jailbreaking LLM-Controlled Robots

URL: http://arxiv.org/abs/2410.13691v2
Date: Sat, 09 Nov 2024 20:00:07 GMT
Title: Jailbreaking LLM-Controlled Robots
Authors: Alexander Robey, Zachary Ravichandran, Vijay Kumar, Hamed Hassani, George J. Pappas,
Abstract summary: Large language models (LLMs) have revolutionized the field of robotics by enabling contextual reasoning and intuitive human-robot interaction. LLMs are vulnerable to jailbreaking attacks, wherein malicious prompters elicit harmful text by bypassing LLM safety guardrails. We introduce RoboPAIR, the first algorithm designed to jailbreak LLM-controlled robots.
Score: 82.04590367171932
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The recent introduction of large language models (LLMs) has revolutionized the field of robotics by enabling contextual reasoning and intuitive human-robot interaction in domains as varied as manipulation, locomotion, and self-driving vehicles. When viewed as a stand-alone technology, LLMs are known to be vulnerable to jailbreaking attacks, wherein malicious prompters elicit harmful text by bypassing LLM safety guardrails. To assess the risks of deploying LLMs in robotics, in this paper, we introduce RoboPAIR, the first algorithm designed to jailbreak LLM-controlled robots. Unlike existing, textual attacks on LLM chatbots, RoboPAIR elicits harmful physical actions from LLM-controlled robots, a phenomenon we experimentally demonstrate in three scenarios: (i) a white-box setting, wherein the attacker has full access to the NVIDIA Dolphins self-driving LLM, (ii) a gray-box setting, wherein the attacker has partial access to a Clearpath Robotics Jackal UGV robot equipped with a GPT-4o planner, and (iii) a black-box setting, wherein the attacker has only query access to the GPT-3.5-integrated Unitree Robotics Go2 robot dog. In each scenario and across three new datasets of harmful robotic actions, we demonstrate that RoboPAIR, as well as several static baselines, finds jailbreaks quickly and effectively, often achieving 100% attack success rates. Our results reveal, for the first time, that the risks of jailbroken LLMs extend far beyond text generation, given the distinct possibility that jailbroken robots could cause physical damage in the real world. Indeed, our results on the Unitree Go2 represent the first successful jailbreak of a deployed commercial robotic system. Addressing this emerging vulnerability is critical for ensuring the safe deployment of LLMs in robotics. Additional media is available at: https://robopair.org

Related papers

Adversarial Attacks on Robotic Vision Language Action Models [118.02118618146568]
We study adversarial attacks on vision-language-action models (VLAs)<n>Our main algorithmic contribution is the adaptation and application of LLM jailbreaking attacks to obtain complete control authority.<n>This differs significantly from LLM jailbreaking literature, as attacks in the real world do not have to be semantically linked to notions of harm.
arXiv Detail & Related papers (2025-06-03T19:43:58Z)
Safety Guardrails for LLM-Enabled Robots [82.0459036717193]
Traditional robot safety approaches do not address the novel vulnerabilities of large language models (LLMs) We propose RoboGuard, a two-stage guardrail architecture to ensure the safety of LLM-enabled robots. We show that RoboGuard reduces the execution of unsafe plans from 92% to below 2.5% without compromising performance on safe plans.
arXiv Detail & Related papers (2025-03-10T22:01:56Z)
Jailbreaking to Jailbreak [7.462595078160592]
We present a novel LLM-as-red-teamer approach in which a human jailbreaks a refusal-trained LLM to make it willing to jailbreak itself or other LLMs. Our work not only introduces a strategic approach to red teaming, drawing inspiration from human red teamers, but also highlights jailbreaking-to-jailbreak as an overlooked failure mode of the safeguard.
arXiv Detail & Related papers (2025-02-09T20:49:16Z)
How Can LLMs and Knowledge Graphs Contribute to Robot Safety? A Few-Shot Learning Approach [8.15784886699733]
Large Language Models (LLMs) are transforming the robotics domain by enabling robots to comprehend and execute natural language instructions. This paper outlines a safety layer that verifies the code generated by ChatGPT before executing it to control a drone in a simulated environment.
arXiv Detail & Related papers (2024-12-16T02:28:34Z)
TrojanRobot: Backdoor Attacks Against Robotic Manipulation in the Physical World [22.313765935846046]
We propose a backdoor attack specifically targeting robotic manipulation and, for the first time, implementing backdoor attack in the physical world. By embedding a backdoor visual language model into the visual perception module within the robotic system, we successfully mislead the robotic arm's operation in the physical world.
arXiv Detail & Related papers (2024-11-18T16:09:26Z)
$π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z)
In-Context Learning Enables Robot Action Prediction in LLMs [52.285739178561705]
We introduce RoboPrompt, a framework that enables offthe-shelf text-only Large Language Models to directly predict robot actions. RoboPrompt shows stronger performance over zero-shot and ICL baselines in simulated and real-world settings.
arXiv Detail & Related papers (2024-10-16T17:56:49Z)
h4rm3l: A language for Composable Jailbreak Attack Synthesis [48.5611060845958]
h4rm3l is a novel approach that addresses the gap with a human-readable domain-specific language. We show that h4rm3l's synthesized attacks are diverse and more successful than existing jailbreak attacks in literature.
arXiv Detail & Related papers (2024-08-09T01:45:39Z)
BadRobot: Manipulating Embodied LLMs in the Physical World [20.96351292684658]
Embodied AI represents systems where AI is integrated into physical entities, enabling them to perceive and interact with their surroundings. Large Language Model (LLM), which exhibits powerful language understanding abilities, has been extensively employed in embodied AI. We introduce BadRobot, a novel attack paradigm aiming to make embodied LLMs violate safety and ethical constraints through typical voice-based user-system interactions.
arXiv Detail & Related papers (2024-07-16T13:13:16Z)
Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration [4.2460673279562755]
Large Language Models (LLMs) are gaining popularity in the field of robotics. This paper proposes a novel approach to enhance the performance of LLM-based autonomous manipulation through Human-Robot Collaboration (HRC) The approach involves using a prompted GPT-4 language model to decompose high-level language commands into sequences of motions that can be executed by the robot.
arXiv Detail & Related papers (2024-06-20T08:23:49Z)
Physical Backdoor Attack can Jeopardize Driving with Vision-Large-Language Models [53.701148276912406]
Vision-Large-Language-models (VLMs) have great application prospects in autonomous driving. BadVLMDriver is the first backdoor attack against VLMs for autonomous driving that can be launched in practice using physical objects. BadVLMDriver achieves a 92% attack success rate in inducing a sudden acceleration when coming across a pedestrian holding a red balloon.
arXiv Detail & Related papers (2024-04-19T14:40:38Z)
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs [66.05593434288625]
This paper introduces a new perspective to jailbreak large language models (LLMs) as human-like communicators. We apply a persuasion taxonomy derived from decades of social science research to generate persuasive adversarial prompts (PAP) to jailbreak LLMs. PAP consistently achieves an attack success rate of over $92%$ on Llama 2-7b Chat, GPT-3.5, and GPT-4 in $10$ trials. On the defense side, we explore various mechanisms against PAP and, found a significant gap in existing defenses.
arXiv Detail & Related papers (2024-01-12T16:13:24Z)
Jailbreaking Black Box Large Language Models in Twenty Queries [97.29563503097995]
Large language models (LLMs) are vulnerable to adversarial jailbreaks. We propose an algorithm that generates semantic jailbreaks with only black-box access to an LLM.
arXiv Detail & Related papers (2023-10-12T15:38:28Z)
Fleet-DAgger: Interactive Robot Fleet Learning with Scalable Human Supervision [72.4735163268491]
Commercial and industrial deployments of robot fleets often fall back on remote human teleoperators during execution. We formalize the Interactive Fleet Learning (IFL) setting, in which multiple robots interactively query and learn from multiple human supervisors. We propose Fleet-DAgger, a family of IFL algorithms, and compare a novel Fleet-DAgger algorithm to 4 baselines in simulation.
arXiv Detail & Related papers (2022-06-29T01:23:57Z)
RoboMal: Malware Detection for Robot Network Systems [4.357338639836869]
We propose the RoboMal framework of static malware detection on binary executables to detect malware before it gets a chance to execute. The framework is compared against widely used supervised learning models: GRU, CNN, and ANN. Notably, the LSTM-based RoboMal model outperforms the other models with an accuracy of 85% and precision of 87% in 10-fold cross-validation.
arXiv Detail & Related papers (2022-01-20T22:11:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.