PIGDreamer: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning
- URL: http://arxiv.org/abs/2508.02159v1
- Date: Mon, 04 Aug 2025 08:01:19 GMT
- Title: PIGDreamer: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning
- Authors: Dongchi Huang, Jiaqi Wang, Yang Li, Chunhe Xia, Tianle Zhang, Kaige Zhang,
- Abstract summary: We propose a model-based safe reinforcement learning approach that leverages privileged information to enhance the agent's safety and performance.<n>Our empirical results demonstrate that our approach significantly outperforms existing methods in terms of safety and task-centric performance.
- Score: 23.384621982394673
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Partial observability presents a significant challenge for safe reinforcement learning, as it impedes the identification of potential risks and rewards. Leveraging specific types of privileged information during training to mitigate the effects of partial observability has yielded notable empirical successes. In this paper, we propose Asymmetric Constrained Partially Observable Markov Decision Processes (ACPOMDPs) to theoretically examine the advantages of incorporating privileged information. Building upon ACPOMDPs, we propose the Privileged Information Guided Dreamer, a model-based safe reinforcement learning approach that leverages privileged information to enhance the agent's safety and performance through privileged representation alignment and an asymmetric actor-critic structure. Our empirical results demonstrate that our approach significantly outperforms existing methods in terms of safety and task-centric performance. Meanwhile, compared to alternative privileged model-based reinforcement learning methods, our approach exhibits superior performance and ease of training.
Related papers
- Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing [54.44838681588145]
ExRec is a framework for personalized exercise recommendation with semantically-grounded knowledge tracing.<n>We show that ExRec generalizes robustly to new, unseen questions and that it produces interpretable student learning trajectories.
arXiv Detail & Related papers (2025-07-15T07:54:04Z) - Guided Policy Optimization under Partial Observability [36.853129816484845]
Reinforcement Learning (RL) in partially observable environments poses significant challenges due to the complexity of learning under uncertainty.<n>We introduce Guided Policy Optimization (GPO), a framework that co-trains a guider and a learner.<n>We theoretically demonstrate that this learning scheme achieves optimality comparable to direct RL, thereby overcoming key limitations inherent in existing approaches.
arXiv Detail & Related papers (2025-05-21T12:01:08Z) - Advancing Embodied Agent Security: From Safety Benchmarks to Input Moderation [52.83870601473094]
Embodied agents exhibit immense potential across a multitude of domains.<n>Existing research predominantly concentrates on the security of general large language models.<n>This paper introduces a novel input moderation framework, meticulously designed to safeguard embodied agents.
arXiv Detail & Related papers (2025-04-22T08:34:35Z) - Probabilistic Shielding for Safe Reinforcement Learning [51.35559820893218]
In real-life scenarios, a Reinforcement Learning (RL) agent must often also behave in a safe manner, including at training time.<n>We present a new, scalable method, which enjoys strict formal guarantees for Safe RL.<n>We show that our approach provides a strict formal safety guarantee that the agent stays safe at training and test time.
arXiv Detail & Related papers (2025-03-09T17:54:33Z) - OpenAI o1 System Card [274.83891368890977]
The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought.<n>This report outlines the safety work carried out for the OpenAI o1 and OpenAI o1-mini models, including safety evaluations, external red teaming, and Preparedness Framework evaluations.
arXiv Detail & Related papers (2024-12-21T18:04:31Z) - Safe to Serve: Aligning Instruction-Tuned Models for Safety and Helpfulness [0.0]
Large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning and text generation.<n>LLMs can inadvertently generate unsafe or biased responses when prompted with problematic inputs.<n>This research addresses the critical challenge of developing language models that generate both helpful and harmless content.
arXiv Detail & Related papers (2024-11-26T06:52:22Z) - Active Learning for Robust and Representative LLM Generation in Safety-Critical Scenarios [32.16984263644299]
Large Language Models (LLMs) can generate valuable data for safety measures, but often exhibit distributional biases.
We propose a novel framework that integrates active learning with clustering to guide LLM generation.
Our results show that the proposed framework produces a more representative set of safety scenarios without requiring prior knowledge of the underlying data distribution.
arXiv Detail & Related papers (2024-10-14T21:48:14Z) - Inverse-RLignment: Large Language Model Alignment from Demonstrations through Inverse Reinforcement Learning [62.05713042908654]
We introduce Alignment from Demonstrations (AfD), a novel approach leveraging high-quality demonstration data to overcome these challenges.<n>We formalize AfD within a sequential decision-making framework, highlighting its unique challenge of missing reward signals.<n> Practically, we propose a computationally efficient algorithm that extrapolates over a tailored reward model for AfD.
arXiv Detail & Related papers (2024-05-24T15:13:53Z) - Feasibility Consistent Representation Learning for Safe Reinforcement Learning [25.258227763316228]
We introduce a novel framework named Feasibility Consistent Safe Reinforcement Learning (FCSRL)
This framework combines representation learning with feasibility-oriented objectives to identify and extract safety-related information from the raw state for safe RL.
Our method is capable of learning a better safety-aware embedding and achieving superior performance than previous representation learning baselines.
arXiv Detail & Related papers (2024-05-20T01:37:21Z) - Certifying Safety in Reinforcement Learning under Adversarial
Perturbation Attacks [23.907977144668838]
We propose a partially-supervised reinforcement learning (PSRL) framework that takes advantage of an additional assumption that the true state of the POMDP is known at training time.
We present the first approach for certifying safety of PSRL policies under adversarial input perturbations, and two adversarial training approaches that make direct use of PSRL.
arXiv Detail & Related papers (2022-12-28T22:33:38Z) - SAMBA: Safe Model-Based & Active Reinforcement Learning [59.01424351231993]
SAMBA is a framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics.
We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations.
We provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
arXiv Detail & Related papers (2020-06-12T10:40:46Z) - Privileged Information Dropout in Reinforcement Learning [56.82218103971113]
Using privileged information during training can improve the sample efficiency and performance of machine learning systems.
In this work, we investigate Privileged Information Dropout (pid) for achieving the latter which can be applied equally to value-based and policy-based reinforcement learning algorithms.
arXiv Detail & Related papers (2020-05-19T05:32:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.