PrivilegedDreamer: Explicit Imagination of Privileged Information for Rapid Adaptation of Learned Policies
- URL: http://arxiv.org/abs/2502.11377v1
- Date: Mon, 17 Feb 2025 02:46:02 GMT
- Title: PrivilegedDreamer: Explicit Imagination of Privileged Information for Rapid Adaptation of Learned Policies
- Authors: Morgan Byrd, Jackson Crandell, Mili Das, Jessica Inman, Robert Wright, Sehoon Ha,
- Abstract summary: We introduce PrivilegedDreamer, a model-based reinforcement learning framework that extends the existing model-based approach by incorporating an explicit parameter estimation module.
Our empirical analysis on five diverse HIP-MDP tasks demonstrates that PrivilegedDreamer outperforms state-of-the-art model-based, model-free, and do- main adaptation learning algorithms.
- Score: 7.376615925443845
- License:
- Abstract: Numerous real-world control problems involve dynamics and objectives affected by unobservable hidden pa- rameters, ranging from autonomous driving to robotic manipu- lation, which cause performance degradation during sim-to-real transfer. To represent these kinds of domains, we adopt hidden- parameter Markov decision processes (HIP-MDPs), which model sequential decision problems where hidden variables parameterize transition and reward functions. Existing ap- proaches, such as domain randomization, domain adaptation, and meta-learning, simply treat the effect of hidden param- eters as additional variance and often struggle to effectively handle HIP-MDP problems, especially when the rewards are parameterized by hidden variables. We introduce Privileged- Dreamer, a model-based reinforcement learning framework that extends the existing model-based approach by incorporating an explicit parameter estimation module. PrivilegedDreamer features its novel dual recurrent architecture that explicitly estimates hidden parameters from limited historical data and enables us to condition the model, actor, and critic networks on these estimated parameters. Our empirical analysis on five diverse HIP-MDP tasks demonstrates that PrivilegedDreamer outperforms state-of-the-art model-based, model-free, and do- main adaptation learning algorithms. Additionally, we conduct ablation studies to justify the inclusion of each component in the proposed architecture.
Related papers
- Unified Parameter-Efficient Unlearning for LLMs [25.195126838721492]
Large Language Models (LLMs) have revolutionized natural language processing, enabling advanced understanding and reasoning capabilities across a variety of tasks.
This raises significant privacy and security concerns, as models may inadvertently retain and disseminate sensitive or undesirable information.
We introduce a novel instance-wise unlearning framework, LLMEraser, which systematically categorizes unlearning tasks and applies precise adjustments using influence functions.
arXiv Detail & Related papers (2024-11-30T07:21:02Z) - Self-Healing Machine Learning: A Framework for Autonomous Adaptation in Real-World Environments [50.310636905746975]
Real-world machine learning systems often encounter model performance degradation due to distributional shifts in the underlying data generating process.
Existing approaches to addressing shifts, such as concept drift adaptation, are limited by their reason-agnostic nature.
We propose self-healing machine learning (SHML) to overcome these limitations.
arXiv Detail & Related papers (2024-10-31T20:05:51Z) - Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference.
Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable.
We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z) - Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning [78.72226641279863]
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling.
Our research explores task-specific model pruning to inform decisions about designing SMoE architectures.
We introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training.
arXiv Detail & Related papers (2024-09-02T22:35:03Z) - Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - MergeNet: Knowledge Migration across Heterogeneous Models, Tasks, and Modalities [72.05167902805405]
We present MergeNet, which learns to bridge the gap of parameter spaces of heterogeneous models.
The core mechanism of MergeNet lies in the parameter adapter, which operates by querying the source model's low-rank parameters.
MergeNet is learned alongside both models, allowing our framework to dynamically transfer and adapt knowledge relevant to the current stage.
arXiv Detail & Related papers (2024-04-20T08:34:39Z) - Variational Inference of Parameters in Opinion Dynamics Models [9.51311391391997]
This work uses variational inference to estimate the parameters of an opinion dynamics ABM.
We transform the inference process into an optimization problem suitable for automatic differentiation.
Our approach estimates both macroscopic (bounded confidence intervals and backfire thresholds) and microscopic ($200$ categorical, agent-level roles) more accurately than simulation-based and MCMC methods.
arXiv Detail & Related papers (2024-03-08T14:45:18Z) - ReCoRe: Regularized Contrastive Representation Learning of World Model [21.29132219042405]
We present a world model that learns invariant features using contrastive unsupervised learning and an intervention-invariant regularizer.
Our method outperforms current state-of-the-art model-based and model-free RL methods and significantly improves on out-of-distribution point navigation tasks evaluated on the iGibson benchmark.
arXiv Detail & Related papers (2023-12-14T15:53:07Z) - Mitigate Domain Shift by Primary-Auxiliary Objectives Association for
Generalizing Person ReID [39.98444065846305]
ReID models struggle in learning domain-invariant representation solely through training on an instance classification objective.
We introduce a method that guides model learning of the primary ReID instance classification objective by a concurrent auxiliary learning objective on weakly labeled pedestrian saliency detection.
Our model can be extended with the recent test-time diagram to form the PAOA+, which performs on-the-fly optimization against the auxiliary objective.
arXiv Detail & Related papers (2023-10-24T15:15:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.