SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning
- URL: http://arxiv.org/abs/2503.03480v2
- Date: Sat, 31 May 2025 14:22:50 GMT
- Title: SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning
- Authors: Borong Zhang, Yuhao Zhang, Jiaming Ji, Yingshan Lei, Josef Dai, Yuanpei Chen, Yaodong Yang,
- Abstract summary: Vision-language-action models (VLAs) show potential as generalist robot policies.<n>These models pose extreme safety challenges during real-world deployment, including the risk of harm to the environment, the robot itself, and humans.<n>We address this by exploring an integrated safety approach (ISA), systematically modeling safety requirements, then actively eliciting diverse unsafe behaviors.
- Score: 10.844235123282056
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Vision-language-action models (VLAs) show potential as generalist robot policies. However, these models pose extreme safety challenges during real-world deployment, including the risk of harm to the environment, the robot itself, and humans. How can safety constraints be explicitly integrated into VLAs? We address this by exploring an integrated safety approach (ISA), systematically modeling safety requirements, then actively eliciting diverse unsafe behaviors, effectively constraining VLA policies via safe reinforcement learning, and rigorously assuring their safety through targeted evaluations. Leveraging the constrained Markov decision process (CMDP) paradigm, ISA optimizes VLAs from a min-max perspective against elicited safety risks. Thus, policies aligned through this comprehensive approach achieve the following key features: (I) effective safety-performance trade-offs, this exploration yields an 83.58% safety improvement compared to the current state-of-the-art method, while also maintaining task performance (+3.85%). (II) strong safety assurance, with the ability to mitigate long-tail risks and handle extreme failure scenarios. (III) robust generalization of learned safety behaviors to various out-of-distribution perturbations. Our data, models and newly proposed benchmark environment are available at https://pku-safevla.github.io.
Related papers
- Towards provable probabilistic safety for scalable embodied AI systems [79.31011047593492]
Embodied AI systems are increasingly prevalent across various applications.<n> Ensuring their safety in complex operating environments remains a major challenge.<n>This Perspective offers a pathway toward safer, large-scale adoption of embodied AI systems in safety-critical applications.
arXiv Detail & Related papers (2025-06-05T15:46:25Z) - HoliSafe: Holistic Safety Benchmarking and Modeling with Safety Meta Token for Vision-Language Model [52.72318433518926]
Existing safety-tuning datasets and benchmarks only partially consider how image-text interactions can yield harmful content.<n>We introduce a holistic safety dataset and benchmark, HoliSafe, that spans all five safe/unsafe image-text combinations.<n>We propose SafeLLaVA, a novel VLM augmented with a learnable safety meta token and a dedicated safety head.
arXiv Detail & Related papers (2025-06-05T07:26:34Z) - Shape it Up! Restoring LLM Safety during Finetuning [66.46166656543761]
Finetuning large language models (LLMs) enables user-specific customization but introduces critical safety risks.<n>We propose dynamic safety shaping (DSS), a framework that uses fine-grained safety signals to reinforce learning from safe segments of a response while suppressing unsafe content.<n>We present STAR-DSS, guided by STAR scores, that robustly mitigates finetuning risks and delivers substantial safety improvements across diverse threats, datasets, and model families.
arXiv Detail & Related papers (2025-05-22T18:05:16Z) - VL-SAFE: Vision-Language Guided Safety-Aware Reinforcement Learning with World Models for Autonomous Driving [1.9242820889313577]
Reinforcement learning (RL)-based autonomous driving policy learning faces critical limitations.<n>RL often fail to capture the true semantic meaning of "safety" in complex driving contexts.<n>We propose VL-SAFE, a world model-based safe RL framework with Vision-Language model (VLM)-as-safety-guidance paradigm.
arXiv Detail & Related papers (2025-05-22T08:29:59Z) - Vulnerability Mitigation for Safety-Aligned Language Models via Debiasing [12.986006070964772]
Safety alignment is an essential research topic for real-world AI applications.<n>Our study first identified the difficulty of eliminating such vulnerabilities without sacrificing the model's helpfulness.<n>Our method could enhance the model's helpfulness while maintaining safety, thus improving the trade-off-front.
arXiv Detail & Related papers (2025-02-04T09:31:54Z) - Defining and Evaluating Physical Safety for Large Language Models [62.4971588282174]
Large Language Models (LLMs) are increasingly used to control robotic systems such as drones.
Their risks of causing physical threats and harm in real-world applications remain unexplored.
We classify the physical safety risks of drones into four categories: (1) human-targeted threats, (2) object-targeted threats, (3) infrastructure attacks, and (4) regulatory violations.
arXiv Detail & Related papers (2024-11-04T17:41:25Z) - ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning [48.536695794883826]
We present ActSafe, a novel model-based RL algorithm for safe and efficient exploration.<n>We show that ActSafe guarantees safety during learning while also obtaining a near-optimal policy in finite time.<n>In addition, we propose a practical variant of ActSafe that builds on latest model-based RL advancements.
arXiv Detail & Related papers (2024-10-12T10:46:02Z) - How Does Vision-Language Adaptation Impact the Safety of Vision Language Models? [27.46416187893547]
Vision-Language adaptation (VL adaptation) transforms Large Language Models (LLMs) into Large Vision-Language Models (LVLMs)
Despite potential harmfulness due to weakened safety measures, in-depth analysis on the effects of VL adaptation on safety remains under-explored.
arXiv Detail & Related papers (2024-10-10T03:12:03Z) - SCANS: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering [56.92068213969036]
Safety alignment is indispensable for Large Language Models (LLMs) to defend threats from malicious instructions.<n>Recent researches reveal safety-aligned LLMs prone to reject benign queries due to the exaggerated safety issue.<n>We propose a Safety-Conscious Activation Steering (SCANS) method to mitigate the exaggerated safety concerns.
arXiv Detail & Related papers (2024-08-21T10:01:34Z) - Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models [65.06446825020578]
Safety alignment is crucial to ensure that large language models (LLMs) behave in ways that align with human preferences and prevent harmful actions during inference.
We aim to measure the risks in finetuning LLMs through navigating the LLM safety landscape.
arXiv Detail & Related papers (2024-05-27T17:31:56Z) - Safeguarded Progress in Reinforcement Learning: Safe Bayesian
Exploration for Control Policy Synthesis [63.532413807686524]
This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL)
We propose a new architecture that handles the trade-off between efficient progress and safety during exploration.
arXiv Detail & Related papers (2023-12-18T16:09:43Z) - Safe Deep Policy Adaptation [7.2747306035142225]
Policy adaptation based on reinforcement learning (RL) offers versatility and generalizability but presents safety and robustness challenges.
We propose SafeDPA, a novel RL and control framework that simultaneously tackles the problems of policy adaptation and safe reinforcement learning.
We provide theoretical safety guarantees of SafeDPA and show the robustness of SafeDPA against learning errors and extra perturbations.
arXiv Detail & Related papers (2023-10-08T00:32:59Z) - Towards Safer Generative Language Models: A Survey on Safety Risks,
Evaluations, and Improvements [76.80453043969209]
This survey presents a framework for safety research pertaining to large models.
We begin by introducing safety issues of wide concern, then delve into safety evaluation methods for large models.
We explore the strategies for enhancing large model safety from training to deployment.
arXiv Detail & Related papers (2023-02-18T09:32:55Z) - ISAACS: Iterative Soft Adversarial Actor-Critic for Safety [0.9217021281095907]
This work introduces a novel approach enabling scalable synthesis of robust safety-preserving controllers for robotic systems.
A safety-seeking fallback policy is co-trained with an adversarial "disturbance" agent that aims to invoke the worst-case realization of model error.
While the learned control policy does not intrinsically guarantee safety, it is used to construct a real-time safety filter.
arXiv Detail & Related papers (2022-12-06T18:53:34Z) - Lyapunov-based uncertainty-aware safe reinforcement learning [0.0]
InReinforcement learning (RL) has shown a promising performance in learning optimal policies for a variety of sequential decision-making tasks.
In many real-world RL problems, besides optimizing the main objectives, the agent is expected to satisfy a certain level of safety.
We propose a Lyapunov-based uncertainty-aware safe RL model to address these limitations.
arXiv Detail & Related papers (2021-07-29T13:08:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.