Decomposed Mutual Information Optimization for Generalized Context in
Meta-Reinforcement Learning
- URL: http://arxiv.org/abs/2210.04209v1
- Date: Sun, 9 Oct 2022 09:44:23 GMT
- Title: Decomposed Mutual Information Optimization for Generalized Context in
Meta-Reinforcement Learning
- Authors: Yao Mu, Yuzheng Zhuang, Fei Ni, Bin Wang, Jianyu Chen, Jianye Hao,
Ping Luo
- Abstract summary: Multiple confounders can influence the transition dynamics, making it challenging to infer accurate context for decision-making.
This paper addresses such a challenge by Decomposed Mutual INformation Optimization (DOMINO) for context learning.
Our theoretical analysis shows that DOMINO can overcome the underestimation of the mutual information caused by multi-confounded challenges.
- Score: 35.87062321504049
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adapting to the changes in transition dynamics is essential in robotic
applications. By learning a conditional policy with a compact context,
context-aware meta-reinforcement learning provides a flexible way to adjust
behavior according to dynamics changes. However, in real-world applications,
the agent may encounter complex dynamics changes. Multiple confounders can
influence the transition dynamics, making it challenging to infer accurate
context for decision-making. This paper addresses such a challenge by
Decomposed Mutual INformation Optimization (DOMINO) for context learning, which
explicitly learns a disentangled context to maximize the mutual information
between the context and historical trajectories, while minimizing the state
transition prediction error. Our theoretical analysis shows that DOMINO can
overcome the underestimation of the mutual information caused by
multi-confounded challenges via learning disentangled context and reduce the
demand for the number of samples collected in various environments. Extensive
experiments show that the context learned by DOMINO benefits both model-based
and model-free reinforcement learning algorithms for dynamics generalization in
terms of sample efficiency and performance in unseen environments.
Related papers
- On-the-fly Modulation for Balanced Multimodal Learning [53.616094855778954]
Multimodal learning is expected to boost model performance by integrating information from different modalities.
The widely-used joint training strategy leads to imbalanced and under-optimized uni-modal representations.
We propose On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies to modulate the optimization of each modality.
arXiv Detail & Related papers (2024-10-15T13:15:50Z) - Text-centric Alignment for Multi-Modality Learning [3.6961400222746748]
We propose the Text-centric Alignment for Multi-Modality Learning (TAMML) approach.
By leveraging the unique properties of text as a unified semantic space, TAMML demonstrates significant improvements in handling unseen, diverse, and unpredictable modality combinations.
This study contributes to the field by offering a flexible, effective solution for real-world applications where modality availability is dynamic and uncertain.
arXiv Detail & Related papers (2024-02-12T22:07:43Z) - Dynamics Generalisation in Reinforcement Learning via Adaptive
Context-Aware Policies [13.410372954752496]
We present an investigation into how context should be incorporated into behaviour learning to improve generalisation.
We introduce a neural network architecture, the Decision Adapter, which generates the weights of an adapter module and conditions the behaviour of an agent on the context information.
We show that the Decision Adapter is a useful generalisation of a previously proposed architecture and empirically demonstrate that it results in superior generalisation performance.
arXiv Detail & Related papers (2023-10-25T14:50:05Z) - Pre-training Contextualized World Models with In-the-wild Videos for
Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks.
We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling.
Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z) - Meta-learning using privileged information for dynamics [66.32254395574994]
We extend the Neural ODE Process model to use additional information within the Learning Using Privileged Information setting.
We validate our extension with experiments showing improved accuracy and calibration on simulated dynamics tasks.
arXiv Detail & Related papers (2021-04-29T12:18:02Z) - Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning.
The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior.
Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z) - Context-aware Dynamics Model for Generalization in Model-Based
Reinforcement Learning [124.9856253431878]
We decompose the task of learning a global dynamics model into two stages: (a) learning a context latent vector that captures the local dynamics, then (b) predicting the next state conditioned on it.
In order to encode dynamics-specific information into the context latent vector, we introduce a novel loss function that encourages the context latent vector to be useful for predicting both forward and backward dynamics.
The proposed method achieves superior generalization ability across various simulated robotics and control tasks, compared to existing RL schemes.
arXiv Detail & Related papers (2020-05-14T08:10:54Z) - Contextual Policy Transfer in Reinforcement Learning Domains via Deep
Mixtures-of-Experts [24.489002406693128]
We introduce a novel mixture-of-experts formulation for learning state-dependent beliefs over source task dynamics.
We show how this model can be incorporated into standard policy reuse frameworks.
arXiv Detail & Related papers (2020-02-29T07:58:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.