Related papers: Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning

Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning

URL: http://arxiv.org/abs/2210.04209v1
Date: Sun, 9 Oct 2022 09:44:23 GMT
Title: Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning
Authors: Yao Mu, Yuzheng Zhuang, Fei Ni, Bin Wang, Jianyu Chen, Jianye Hao, Ping Luo
Abstract summary: Multiple confounders can influence the transition dynamics, making it challenging to infer accurate context for decision-making. This paper addresses such a challenge by Decomposed Mutual INformation Optimization (DOMINO) for context learning. Our theoretical analysis shows that DOMINO can overcome the underestimation of the mutual information caused by multi-confounded challenges.
Score: 35.87062321504049
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Adapting to the changes in transition dynamics is essential in robotic applications. By learning a conditional policy with a compact context, context-aware meta-reinforcement learning provides a flexible way to adjust behavior according to dynamics changes. However, in real-world applications, the agent may encounter complex dynamics changes. Multiple confounders can influence the transition dynamics, making it challenging to infer accurate context for decision-making. This paper addresses such a challenge by Decomposed Mutual INformation Optimization (DOMINO) for context learning, which explicitly learns a disentangled context to maximize the mutual information between the context and historical trajectories, while minimizing the state transition prediction error. Our theoretical analysis shows that DOMINO can overcome the underestimation of the mutual information caused by multi-confounded challenges via learning disentangled context and reduce the demand for the number of samples collected in various environments. Extensive experiments show that the context learned by DOMINO benefits both model-based and model-free reinforcement learning algorithms for dynamics generalization in terms of sample efficiency and performance in unseen environments.

Related papers

Denoising-based Contractive Imitation Learning [1.3518297878940662]
Denoising mechanism enhances contraction properties of state transition mapping. Our method is straightforward to implement and can be easily integrated with existing imitation learning frameworks. Empirical results demonstrate that our approach effectively improves success rate of various imitation learning tasks under noise perturbation.
arXiv Detail & Related papers (2025-03-20T07:52:19Z)
Exploring Contextual Flux in Large Language Models: A Novel Approach to Self-Modulating Semantic Networks [0.0]
Self-modulating mechanisms introduce dynamic adaptation capabilities within language models. contextual realignment strategies influence token embedding trajectories across extended sequences. Self-regulation enhances text generation consistency while preserving generative flexibility. Findings suggest that while adaptive embedding updates improve certain aspects of coherence, their impact remains contingent on model capacity and input complexity.
arXiv Detail & Related papers (2025-02-16T01:08:19Z)
Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining [55.262510814326035]
Existing reweighting strategies primarily focus on group-level data importance. We introduce novel algorithms for dynamic, instance-level data reweighting. Our framework allows us to devise reweighting strategies deprioritizing redundant or uninformative data.
arXiv Detail & Related papers (2025-02-10T17:57:15Z)
On-the-fly Modulation for Balanced Multimodal Learning [53.616094855778954]
Multimodal learning is expected to boost model performance by integrating information from different modalities. The widely-used joint training strategy leads to imbalanced and under-optimized uni-modal representations. We propose On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies to modulate the optimization of each modality.
arXiv Detail & Related papers (2024-10-15T13:15:50Z)
Text-centric Alignment for Multi-Modality Learning [3.6961400222746748]
We propose the Text-centric Alignment for Multi-Modality Learning (TAMML) approach. By leveraging the unique properties of text as a unified semantic space, TAMML demonstrates significant improvements in handling unseen, diverse, and unpredictable modality combinations. This study contributes to the field by offering a flexible, effective solution for real-world applications where modality availability is dynamic and uncertain.
arXiv Detail & Related papers (2024-02-12T22:07:43Z)
Dynamics Generalisation in Reinforcement Learning via Adaptive Context-Aware Policies [13.410372954752496]
We present an investigation into how context should be incorporated into behaviour learning to improve generalisation. We introduce a neural network architecture, the Decision Adapter, which generates the weights of an adapter module and conditions the behaviour of an agent on the context information. We show that the Decision Adapter is a useful generalisation of a previously proposed architecture and empirically demonstrate that it results in superior generalisation performance.
arXiv Detail & Related papers (2023-10-25T14:50:05Z)
Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks. We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling. Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z)
Meta-learning using privileged information for dynamics [66.32254395574994]
We extend the Neural ODE Process model to use additional information within the Learning Using Privileged Information setting. We validate our extension with experiments showing improved accuracy and calibration on simulated dynamics tasks.
arXiv Detail & Related papers (2021-04-29T12:18:02Z)
Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning. The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior. Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z)
Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning [124.9856253431878]
We decompose the task of learning a global dynamics model into two stages: (a) learning a context latent vector that captures the local dynamics, then (b) predicting the next state conditioned on it. In order to encode dynamics-specific information into the context latent vector, we introduce a novel loss function that encourages the context latent vector to be useful for predicting both forward and backward dynamics. The proposed method achieves superior generalization ability across various simulated robotics and control tasks, compared to existing RL schemes.
arXiv Detail & Related papers (2020-05-14T08:10:54Z)
Contextual Policy Transfer in Reinforcement Learning Domains via Deep Mixtures-of-Experts [24.489002406693128]
We introduce a novel mixture-of-experts formulation for learning state-dependent beliefs over source task dynamics. We show how this model can be incorporated into standard policy reuse frameworks.
arXiv Detail & Related papers (2020-02-29T07:58:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.