Safety Not Found (404): Hidden Risks of LLM-Based Robotics Decision Making
- URL: http://arxiv.org/abs/2601.05529v2
- Date: Thu, 15 Jan 2026 05:09:03 GMT
- Title: Safety Not Found (404): Hidden Risks of LLM-Based Robotics Decision Making
- Authors: Jua Han, Jaeyoon Seo, Jungbin Min, Jean Oh, Jihie Kim,
- Abstract summary: One mistake by an AI system in a safety-critical setting can cost lives.<n>As Large Language Models (LLMs) become integral to robotics decision-making, the physical dimension of risk grows.<n>This paper addresses the urgent need to systematically evaluate LLM performance in scenarios where even minor errors are catastrophic.
- Score: 12.400383981686801
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One mistake by an AI system in a safety-critical setting can cost lives. As Large Language Models (LLMs) become integral to robotics decision-making, the physical dimension of risk grows; a single wrong instruction can directly endanger human safety. This paper addresses the urgent need to systematically evaluate LLM performance in scenarios where even minor errors are catastrophic. Through a qualitative evaluation of a fire evacuation scenario, we identified critical failure cases in LLM-based decision-making. Based on these, we designed seven tasks for quantitative assessment, categorized into: Complete Information, Incomplete Information, and Safety-Oriented Spatial Reasoning (SOSR). Complete information tasks utilize ASCII maps to minimize interpretation ambiguity and isolate spatial reasoning from visual processing. Incomplete information tasks require models to infer missing context, testing for spatial continuity versus hallucinations. SOSR tasks use natural language to evaluate safe decision-making in life-threatening contexts. We benchmark various LLMs and Vision-Language Models (VLMs) across these tasks. Beyond aggregate performance, we analyze the implications of a 1% failure rate, highlighting how "rare" errors escalate into catastrophic outcomes. Results reveal serious vulnerabilities: several models achieved a 0% success rate in ASCII navigation, while in a simulated fire drill, models instructed robots to move toward hazardous areas instead of emergency exits. Our findings lead to a sobering conclusion: current LLMs are not ready for direct deployment in safety-critical systems. A 99% accuracy rate is dangerously misleading in robotics, as it implies one out of every hundred executions could result in catastrophic harm. We demonstrate that even state-of-the-art models cannot guarantee safety, and absolute reliance on them creates unacceptable risks.
Related papers
- What Breaks Embodied AI Security:LLM Vulnerabilities, CPS Flaws,or Something Else? [28.12412876058788]
Embodied AI systems are rapidly transitioning from controlled environments to safety critical real-world deployments.<n>Unlike disembodied AI, failures in embodied intelligence lead to irreversible physical consequences.<n>We argue that a significant class of failures arises from embodiment-induced system-level mismatches.
arXiv Detail & Related papers (2026-02-19T13:29:00Z) - The Shadow Self: Intrinsic Value Misalignment in Large Language Model Agents [37.75212140218036]
We formalize the Loss-of-Control risk and identify the previously underexamined Intrinsic Value Misalignment (Intrinsic VM)<n>We then introduce IMPRESS, a scenario-driven framework for systematically assessing this risk.<n>We evaluate Intrinsic VM on 21 state-of-the-art LLM agents and find that it is a common and broadly observed safety risk across models.
arXiv Detail & Related papers (2026-01-24T07:09:50Z) - AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions [64.85086226439954]
We present SAFE, a benchmark for assessing the safety of embodied VLM agents on hazardous instructions.<n> SAFE comprises three components: SAFE-THOR, SAFE-VERSE, and SAFE-DIAGNOSE.<n>We uncover systematic failures in translating hazard recognition into safe planning and execution.
arXiv Detail & Related papers (2025-06-17T16:37:35Z) - ROSE: Toward Reality-Oriented Safety Evaluation of Large Language Models [60.28667314609623]
Large Language Models (LLMs) are increasingly deployed as black-box components in real-world applications.<n>We propose Reality-Oriented Safety Evaluation (ROSE), a novel framework that uses multi-objective reinforcement learning to fine-tune an adversarial LLM.
arXiv Detail & Related papers (2025-06-17T10:55:17Z) - Better Safe Than Sorry? Overreaction Problem of Vision Language Models in Visual Emergency Recognition [12.054081112688074]
Vision-Language Models (VLMs) have shown capabilities in interpreting visual content, but their reliability in safety-critical scenarios remains insufficiently explored.<n>We introduce VERI, a diagnostic benchmark comprising 200 synthetic images (100 contrastive pairs) and an additional 50 real-world images (25 pairs) for validation.<n>Each emergency scene is paired with a visually similar but safe counterpart through human verification.
arXiv Detail & Related papers (2025-05-21T10:57:40Z) - Answer, Refuse, or Guess? Investigating Risk-Aware Decision Making in Language Models [63.559461750135334]
Language models (LMs) are increasingly used to build agents that can act autonomously to achieve goals.<n>We study this "answer-or-defer" problem with an evaluation framework that systematically varies human-specified risk structures.<n>We find that a simple skill-decomposition method, which isolates the independent skills required for answer-or-defer decision making, can consistently improve LMs' decision policies.
arXiv Detail & Related papers (2025-03-03T09:16:26Z) - SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents [58.65256663334316]
We present SafeAgentBench -- the first benchmark for safety-aware task planning of embodied LLM agents in interactive simulation environments.<n>SafeAgentBench includes: (1) an executable, diverse, and high-quality dataset of 750 tasks, rigorously curated to cover 10 potential hazards and 3 task types; (2) SafeAgentEnv, a universal embodied environment with a low-level controller, supporting multi-agent execution with 17 high-level actions for 9 state-of-the-art baselines; and (3) reliable evaluation methods from both execution and semantic perspectives.
arXiv Detail & Related papers (2024-12-17T18:55:58Z) - EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [53.717918131568936]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction.<n>Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results.<n>However, the deployment of these agents in physical environments presents significant safety challenges.<n>This study introduces EARBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z) - BadRobot: Jailbreaking Embodied LLMs in the Physical World [20.96351292684658]
Embodied AI represents systems where AI is integrated into physical entities.<n>Large Language Model (LLM) exhibits powerful language understanding abilities.<n>We introduce BadRobot, a novel attack paradigm aiming to make embodied LLMs violate safety and ethical constraints through typical voice-based user-system interactions.
arXiv Detail & Related papers (2024-07-16T13:13:16Z) - LLM-Driven Robots Risk Enacting Discrimination, Violence, and Unlawful Actions [3.1247504290622214]
Research has raised concerns about the potential for Large Language Models to produce discriminatory outcomes and unsafe behaviors in real-world robot experiments and applications.
We conduct an HRI-based evaluation of discrimination and safety criteria on several highly-rated LLMs.
Our results underscore the urgent need for systematic, routine, and comprehensive risk assessments and assurances to improve outcomes.
arXiv Detail & Related papers (2024-06-13T05:31:49Z) - On the Vulnerability of LLM/VLM-Controlled Robotics [54.57914943017522]
We highlight vulnerabilities in robotic systems integrating large language models (LLMs) and vision-language models (VLMs) due to input modality sensitivities.<n>Our results show that simple input perturbations reduce task execution success rates by 22.2% and 14.6% in two representative LLM/VLM-controlled robotic systems.
arXiv Detail & Related papers (2024-02-15T22:01:45Z) - Introspective Planning: Aligning Robots' Uncertainty with Inherent Task Ambiguity [0.659529078336196]
Large language models (LLMs) exhibit advanced reasoning skills, enabling robots to comprehend natural language instructions and strategically plan high-level actions.<n>LLMs hallucination may result in robots confidently executing plans that are misaligned with user goals or even unsafe in critical scenarios.<n>We propose introspective planning, a systematic approach that align LLM's uncertainty with the inherent ambiguity of the task.
arXiv Detail & Related papers (2024-02-09T16:40:59Z) - ASSERT: Automated Safety Scenario Red Teaming for Evaluating the
Robustness of Large Language Models [65.79770974145983]
ASSERT, Automated Safety Scenario Red Teaming, consists of three methods -- semantically aligned augmentation, target bootstrapping, and adversarial knowledge injection.
We partition our prompts into four safety domains for a fine-grained analysis of how the domain affects model performance.
We find statistically significant performance differences of up to 11% in absolute classification accuracy among semantically related scenarios and error rates of up to 19% absolute error in zero-shot adversarial settings.
arXiv Detail & Related papers (2023-10-14T17:10:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.