Related papers: A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents

A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents

URL: http://arxiv.org/abs/2504.14650v1
Date: Sun, 20 Apr 2025 15:12:14 GMT
Title: A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents
Authors: Yuting Huang, Leilei Ding, Zhipeng Tang, Tianfu Wang, Xinrui Lin, Wuyang Zhang, Mingxiao Ma, Yanyong Zhang,
Abstract summary: Large Language Models (LLMs) exhibit substantial promise in enhancing task-planning capabilities within embodied agents.<n>We present Safe-BeAl, an integrated framework for the measurement (SafePlan-Bench) and alignment (Safe-Align) of LLM-based embodied agents' behaviors.<n>Our empirical analysis reveals that even in the absence of adversarial inputs or malicious intent, LLM-based agents can exhibit unsafe behaviors.
Score: 13.225168384790257
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) exhibit substantial promise in enhancing task-planning capabilities within embodied agents due to their advanced reasoning and comprehension. However, the systemic safety of these agents remains an underexplored frontier. In this study, we present Safe-BeAl, an integrated framework for the measurement (SafePlan-Bench) and alignment (Safe-Align) of LLM-based embodied agents' behaviors. SafePlan-Bench establishes a comprehensive benchmark for evaluating task-planning safety, encompassing 2,027 daily tasks and corresponding environments distributed across 8 distinct hazard categories (e.g., Fire Hazard). Our empirical analysis reveals that even in the absence of adversarial inputs or malicious intent, LLM-based agents can exhibit unsafe behaviors. To mitigate these hazards, we propose Safe-Align, a method designed to integrate physical-world safety knowledge into LLM-based embodied agents while maintaining task-specific performance. Experiments across a variety of settings demonstrate that Safe-BeAl provides comprehensive safety validation, improving safety by 8.55 - 15.22%, compared to embodied agents based on GPT-4, while ensuring successful task completion.

Related papers

SafeMind: Benchmarking and Mitigating Safety Risks in Embodied LLM Agents [7.975014390527644]
Embodied agents powered by large language models (LLMs) inherit advanced planning capabilities; however, their direct interaction with the physical world exposes them to safety vulnerabilities.<n>We present SafeMindBench, a multimodal benchmark with 5,558 samples spanning four task categories (Instr-Risk, Env-Risk, Order-Fix, Req-Align) across high-risk scenarios such as sabotage, harm, privacy, and illegal behavior.<n>We introduce SafeMindAgent, a modular Planner-Executor architecture integrated with three cascaded safety modules, which incorporate safety constraints into the reasoning process.
arXiv Detail & Related papers (2025-09-30T07:24:04Z)
Safety Compliance: Rethinking LLM Safety Reasoning through the Lens of Compliance [49.50518009960314]
Existing safety methods rely on ad-hoc taxonomy and lack a rigorous, systematic protection.<n>We develop a new benchmark for safety compliance by generating realistic LLM safety scenarios seeded with legal statutes.<n>Our experiments demonstrate that the Compliance Reasoner achieves superior performance on the new benchmark.
arXiv Detail & Related papers (2025-09-26T12:11:29Z)
AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions [76.74726258534142]
We propose AGENTSAFE, the first benchmark for evaluating the safety of embodied VLM agents under hazardous instructions.<n> AGENTSAFE simulates realistic agent-environment interactions within a simulation sandbox.<n> benchmark includes 45 adversarial scenarios, 1,350 hazardous tasks, and 8,100 hazardous instructions.
arXiv Detail & Related papers (2025-06-17T16:37:35Z)
Safety Aware Task Planning via Large Language Models in Robotics [22.72668275829238]
This paper introduces SAFER (Safety-Aware Framework for Execution in Robotics), a multi-LLM framework designed to embed safety awareness into robotic task planning.<n>Our framework integrates safety feedback at multiple stages of execution, enabling real-time risk assessment, proactive error correction, and transparent safety evaluation.
arXiv Detail & Related papers (2025-03-19T21:41:10Z)
Probabilistic Shielding for Safe Reinforcement Learning [51.35559820893218]
In real-life scenarios, a Reinforcement Learning (RL) agent must often also behave in a safe manner, including at training time. We present a new, scalable method, which enjoys strict formal guarantees for Safe RL. We show that our approach provides a strict formal safety guarantee that the agent stays safe at training and test time.
arXiv Detail & Related papers (2025-03-09T17:54:33Z)
AgentGuard: Repurposing Agentic Orchestrator for Safety Evaluation of Tool Orchestration [0.3222802562733787]
AgentGuard is a framework to autonomously discover and validate unsafe tool-use. It generates safety constraints to confine the behaviors of agents, achieving the baseline of safety guarantee. The framework operates through four phases: identifying unsafe, validating them in real-world execution, generating safety constraints, and validating constraint efficacy.
arXiv Detail & Related papers (2025-02-13T23:00:33Z)
Agent-SafetyBench: Evaluating the Safety of LLM Agents [72.92604341646691]
We introduce Agent-SafetyBench, a comprehensive benchmark to evaluate the safety of large language models (LLMs)<n>Agent-SafetyBench encompasses 349 interaction environments and 2,000 test cases, evaluating 8 categories of safety risks and covering 10 common failure modes frequently encountered in unsafe interactions.<n>Our evaluation of 16 popular LLM agents reveals a concerning result: none of the agents achieves a safety score above 60%.
arXiv Detail & Related papers (2024-12-19T02:35:15Z)
SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents [42.69984822098671]
Existing benchmarks predominantly overlook critical safety risks, focusing solely on planning performance.<n>We present SafeAgentBench-the first benchmark for safety-aware task planning of embodied LLM agents in interactive simulation environments.<n>SafeAgentBench includes: (1) an executable, diverse, and high-quality dataset of 750 tasks, rigorously curated to cover 10 potential hazards and 3 task types; (2) SafeAgentEnv, a universal embodied environment with a low-level controller, supporting multi-agent execution with 17 high-level actions for 8 state-of-the-art baselines; and (3) reliable evaluation methods from both execution and semantic perspectives.
arXiv Detail & Related papers (2024-12-17T18:55:58Z)
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs [75.85283891591678]
Artificial Intelligence (AI) is revolutionizing scientific research, yet its growing integration into laboratory environments presents critical safety challenges.<n>Large language models (LLMs) increasingly assist in tasks ranging from procedural guidance to autonomous experiment orchestration.<n>Such overreliance is especially hazardous in high-stakes laboratory settings, where failures in hazard identification or risk assessment can result in severe accidents.<n>We propose the Laboratory Safety Benchmark (LabSafety Bench), a comprehensive framework that evaluates LLMs and vision language models (VLMs) on their ability to identify potential hazards, assess risks, and predict the consequences of unsafe actions in lab environments.
arXiv Detail & Related papers (2024-10-18T05:21:05Z)
Multimodal Situational Safety [73.63981779844916]
We present the first evaluation and analysis of a novel safety challenge termed Multimodal Situational Safety. For an MLLM to respond safely, whether through language or action, it often needs to assess the safety implications of a language query within its corresponding visual context. We develop the Multimodal Situational Safety benchmark (MSSBench) to assess the situational safety performance of current MLLMs.
arXiv Detail & Related papers (2024-10-08T16:16:07Z)
TrustAgent: Towards Safe and Trustworthy LLM-based Agents [50.33549510615024]
This paper presents an Agent-Constitution-based agent framework, TrustAgent, with a focus on improving the LLM-based agent safety. The proposed framework ensures strict adherence to the Agent Constitution through three strategic components: pre-planning strategy which injects safety knowledge to the model before plan generation, in-planning strategy which enhances safety during plan generation, and post-planning strategy which ensures safety by post-planning inspection.
arXiv Detail & Related papers (2024-02-02T17:26:23Z)
Safety Assessment of Chinese Large Language Models [51.83369778259149]
Large language models (LLMs) may generate insulting and discriminatory content, reflect incorrect social values, and may be used for malicious purposes. To promote the deployment of safe, responsible, and ethical AI, we release SafetyPrompts including 100k augmented prompts and responses by LLMs.
arXiv Detail & Related papers (2023-04-20T16:27:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.