Safety Guardrails for LLM-Enabled Robots
- URL: http://arxiv.org/abs/2503.07885v1
- Date: Mon, 10 Mar 2025 22:01:56 GMT
- Title: Safety Guardrails for LLM-Enabled Robots
- Authors: Zachary Ravichandran, Alexander Robey, Vijay Kumar, George J. Pappas, Hamed Hassani,
- Abstract summary: Traditional robot safety approaches do not address the novel vulnerabilities of large language models (LLMs)<n>We propose RoboGuard, a two-stage guardrail architecture to ensure the safety of LLM-enabled robots.<n>We show that RoboGuard reduces the execution of unsafe plans from 92% to below 2.5% without compromising performance on safe plans.
- Score: 82.0459036717193
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although the integration of large language models (LLMs) into robotics has unlocked transformative capabilities, it has also introduced significant safety concerns, ranging from average-case LLM errors (e.g., hallucinations) to adversarial jailbreaking attacks, which can produce harmful robot behavior in real-world settings. Traditional robot safety approaches do not address the novel vulnerabilities of LLMs, and current LLM safety guardrails overlook the physical risks posed by robots operating in dynamic real-world environments. In this paper, we propose RoboGuard, a two-stage guardrail architecture to ensure the safety of LLM-enabled robots. RoboGuard first contextualizes pre-defined safety rules by grounding them in the robot's environment using a root-of-trust LLM, which employs chain-of-thought (CoT) reasoning to generate rigorous safety specifications, such as temporal logic constraints. RoboGuard then resolves potential conflicts between these contextual safety specifications and a possibly unsafe plan using temporal logic control synthesis, which ensures safety compliance while minimally violating user preferences. Through extensive simulation and real-world experiments that consider worst-case jailbreaking attacks, we demonstrate that RoboGuard reduces the execution of unsafe plans from 92% to below 2.5% without compromising performance on safe plans. We also demonstrate that RoboGuard is resource-efficient, robust against adaptive attacks, and significantly enhanced by enabling its root-of-trust LLM to perform CoT reasoning. These results underscore the potential of RoboGuard to mitigate the safety risks and enhance the reliability of LLM-enabled robots.
Related papers
- Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics [68.36528819227641]
This paper systematically quantifies the robustness of VLA-based robotic systems.
We introduce two untargeted attack objectives that leverage spatial foundations to destabilize robotic actions, and a targeted attack objective that manipulates the robotic trajectory.
We design an adversarial patch generation approach that places a small, colorful patch within the camera's view, effectively executing the attack in both digital and physical environments.
arXiv Detail & Related papers (2024-11-18T01:52:20Z) - Defining and Evaluating Physical Safety for Large Language Models [62.4971588282174]
Large Language Models (LLMs) are increasingly used to control robotic systems such as drones.
Their risks of causing physical threats and harm in real-world applications remain unexplored.
We classify the physical safety risks of drones into four categories: (1) human-targeted threats, (2) object-targeted threats, (3) infrastructure attacks, and (4) regulatory violations.
arXiv Detail & Related papers (2024-11-04T17:41:25Z) - Jailbreaking LLM-Controlled Robots [82.04590367171932]
Large language models (LLMs) have revolutionized the field of robotics by enabling contextual reasoning and intuitive human-robot interaction.
LLMs are vulnerable to jailbreaking attacks, wherein malicious prompters elicit harmful text by bypassing LLM safety guardrails.
We introduce RoboPAIR, the first algorithm designed to jailbreak LLM-controlled robots.
arXiv Detail & Related papers (2024-10-17T15:55:36Z) - BadRobot: Jailbreaking Embodied LLMs in the Physical World [20.96351292684658]
Embodied AI represents systems where AI is integrated into physical entities.
Large Language Model (LLM) exhibits powerful language understanding abilities.
We introduce BadRobot, a novel attack paradigm aiming to make embodied LLMs violate safety and ethical constraints through typical voice-based user-system interactions.
arXiv Detail & Related papers (2024-07-16T13:13:16Z) - ABNet: Attention BarrierNet for Safe and Scalable Robot Learning [58.4951884593569]
Barrier-based method is one of the dominant approaches for safe robot learning.
We propose Attention BarrierNet (ABNet) that is scalable to build larger foundational safe models in an incremental manner.
We demonstrate the strength of ABNet in 2D robot obstacle avoidance, safe robot manipulation, and vision-based end-to-end autonomous driving.
arXiv Detail & Related papers (2024-06-18T19:37:44Z) - Safety Control of Service Robots with LLMs and Embodied Knowledge Graphs [12.787160626087744]
We propose a novel integration of Large Language Models with Embodied Robotic Control Prompts (ERCPs) and Embodied Knowledge Graphs (EKGs)
ERCPs are designed as predefined instructions that ensure LLMs generate safe and precise responses.
EKGs provide a comprehensive knowledge base ensuring that the actions of the robot are continuously aligned with safety protocols.
arXiv Detail & Related papers (2024-05-28T05:50:25Z) - Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-based Decision-Making Systems [27.316115171846953]
Large Language Models (LLMs) have shown significant promise in real-world decision-making tasks for embodied AI.
LLMs are fine-tuned to leverage their inherent common sense and reasoning abilities while being tailored to specific applications.
This fine-tuning process introduces considerable safety and security vulnerabilities, especially in safety-critical cyber-physical systems.
arXiv Detail & Related papers (2024-05-27T17:59:43Z) - Physical Backdoor Attack can Jeopardize Driving with Vision-Large-Language Models [53.701148276912406]
Vision-Large-Language-models (VLMs) have great application prospects in autonomous driving.
BadVLMDriver is the first backdoor attack against VLMs for autonomous driving that can be launched in practice using physical objects.
BadVLMDriver achieves a 92% attack success rate in inducing a sudden acceleration when coming across a pedestrian holding a red balloon.
arXiv Detail & Related papers (2024-04-19T14:40:38Z) - On the Vulnerability of LLM/VLM-Controlled Robotics [54.57914943017522]
We highlight vulnerabilities in robotic systems integrating large language models (LLMs) and vision-language models (VLMs) due to input modality sensitivities.<n>Our results show that simple input perturbations reduce task execution success rates by 22.2% and 14.6% in two representative LLM/VLM-controlled robotic systems.
arXiv Detail & Related papers (2024-02-15T22:01:45Z) - Plug in the Safety Chip: Enforcing Constraints for LLM-driven Robot
Agents [25.62431723307089]
We propose a queryable safety constraint module based on linear temporal logic (LTL)
Our system strictly adheres to the safety constraints and scales well with complex safety constraints, highlighting its potential for practical utility.
arXiv Detail & Related papers (2023-09-18T16:33:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.