Nightmare Dreamer: Dreaming About Unsafe States And Planning Ahead
- URL: http://arxiv.org/abs/2601.04686v1
- Date: Thu, 08 Jan 2026 07:55:07 GMT
- Title: Nightmare Dreamer: Dreaming About Unsafe States And Planning Ahead
- Authors: Oluwatosin Oseni, Shengjie Wang, Jun Zhu, Micah Corah,
- Abstract summary: We introduce Nightmare Dreamer, a model-based Safe RL algorithm that addresses safety concerns.<n> Nightmare Dreamer achieves nearly zero safety violations while maximizing rewards.
- Score: 23.19869346457359
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Reinforcement Learning (RL) has shown remarkable success in real-world applications, particularly in robotics control. However, RL adoption remains limited due to insufficient safety guarantees. We introduce Nightmare Dreamer, a model-based Safe RL algorithm that addresses safety concerns by leveraging a learned world model to predict potential safety violations and plan actions accordingly. Nightmare Dreamer achieves nearly zero safety violations while maximizing rewards. Nightmare Dreamer outperforms model-free baselines on Safety Gymnasium tasks using only image observations, achieving nearly a 20x improvement in efficiency.
Related papers
- THINKSAFE: Self-Generated Safety Alignment for Reasoning Models [60.10077024249373]
We propose ThinkSafe, a framework that restores safety alignment without external teachers.<n>Our key insight is that while compliance suppresses safety mechanisms, models often retain latent knowledge to identify harm.<n> Experiments on DeepSeek-R1-Distill and Qwen3 show ThinkSafe significantly improves safety while preserving reasoning proficiency.
arXiv Detail & Related papers (2026-01-30T16:31:02Z) - Mitigating the Safety Alignment Tax with Null-Space Constrained Policy Optimization [15.729169158082598]
Safety alignment under Reinforcement Learning (RL) often suffers from forgetting learned general abilities.<n>We introduce Null-Space constrained Policy Optimization (NSPO), a novel RL framework for LLM safety alignment.<n>NSPO preserves the model's original core capabilities, while still guaranteeing a descent direction for effective safety alignment.
arXiv Detail & Related papers (2025-12-12T09:01:52Z) - SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning [30.037309138373754]
Vision-language-action models (VLAs) show potential as generalist robot policies.<n>These models pose extreme safety challenges during real-world deployment, including the risk of harm to the environment, the robot itself, and humans.<n>We address this by exploring an integrated safety approach (ISA), systematically modeling safety requirements, then actively eliciting diverse unsafe behaviors.
arXiv Detail & Related papers (2025-03-05T13:16:55Z) - Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training [67.30423823744506]
We introduce a novel approach, Decoupled Refusal Training (DeRTa), designed to empower LLMs to refuse compliance to harmful prompts at any response position.<n>DeRTa incorporates two novel components: (1) Maximum Likelihood Estimation with Harmful Response Prefix, which trains models to recognize and avoid unsafe content by appending a segment of harmful response to the beginning of a safe response, and (2) Reinforced Transition Optimization (RTO), which equips models with the ability to transition from potential harm to safety refusal consistently throughout the harmful response sequence.
arXiv Detail & Related papers (2024-07-12T09:36:33Z) - BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models [57.5404308854535]
Safety backdoor attacks in large language models (LLMs) enable the stealthy triggering of unsafe behaviors while evading detection during normal interactions.
We present BEEAR, a mitigation approach leveraging the insight that backdoor triggers induce relatively uniform drifts in the model's embedding space.
Our bi-level optimization method identifies universal embedding perturbations that elicit unwanted behaviors and adjusts the model parameters to reinforce safe behaviors against these perturbations.
arXiv Detail & Related papers (2024-06-24T19:29:47Z) - GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model [8.915288771953545]
We introduce a novel Generalizable Safety enhancer (GenSafe) that is able to overcome the challenge of data insufficiency.<n>We evaluate GenSafe on multiple SRL approaches and benchmark problems.<n>Our proposed GenSafe not only offers a novel measure to augment existing SRL methods but also shows broad compatibility with various SRL algorithms.
arXiv Detail & Related papers (2024-06-06T09:51:30Z) - Safety through Permissibility: Shield Construction for Fast and Safe Reinforcement Learning [57.84059344739159]
"Shielding" is a popular technique to enforce safety inReinforcement Learning (RL)
We propose a new permissibility-based framework to deal with safety and shield construction.
arXiv Detail & Related papers (2024-05-29T18:00:21Z) - SafeDreamer: Safe Reinforcement Learning with World Models [7.773096110271637]
We introduce SafeDreamer, a novel algorithm incorporating Lagrangian-based methods into world model planning processes.
Our method achieves nearly zero-cost performance on various tasks, spanning low-dimensional and vision-only input.
arXiv Detail & Related papers (2023-07-14T06:00:08Z) - Enhancing Safe Exploration Using Safety State Augmentation [71.00929878212382]
We tackle the problem of safe exploration in model-free reinforcement learning.
We derive policies for scheduling the safety budget during training.
We show that Simmer can stabilize training and improve the performance of safe RL with average constraints.
arXiv Detail & Related papers (2022-06-06T15:23:07Z) - SAUTE RL: Almost Surely Safe Reinforcement Learning Using State
Augmentation [63.25418599322092]
Satisfying safety constraints almost surely (or with probability one) can be critical for deployment of Reinforcement Learning (RL) in real-life applications.
We address the problem by introducing Safety Augmented Markov Decision Processes (MDPs)
We show that Saute MDP allows to view Safe augmentation problem from a different perspective enabling new features.
arXiv Detail & Related papers (2022-02-14T08:57:01Z) - Conservative and Adaptive Penalty for Model-Based Safe Reinforcement
Learning [31.097091898555725]
Reinforcement Learning (RL) agents in the real world must satisfy safety constraints in addition to maximizing a reward objective.
Model-based RL algorithms hold promise for reducing unsafe real-world actions.
We propose Conservative and Adaptive Penalty (CAP), a model-based safe RL framework.
arXiv Detail & Related papers (2021-12-14T19:09:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.