Conceptual Reinforcement Learning for Language-Conditioned Tasks
- URL: http://arxiv.org/abs/2303.05069v1
- Date: Thu, 9 Mar 2023 07:01:06 GMT
- Title: Conceptual Reinforcement Learning for Language-Conditioned Tasks
- Authors: Shaohui Peng, Xing Hu, Rui Zhang, Jiaming Guo, Qi Yi, Ruizhi Chen,
Zidong Du, Ling Li, Qi Guo, Yunji Chen
- Abstract summary: We propose a conceptual reinforcement learning (CRL) framework to learn the concept-like joint representation for language-conditioned policy.
The key insight is that concepts are compact and invariant representations in human instances and in real-world situations.
- Score: 20.300727364957208
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the broad application of deep reinforcement learning (RL),
transferring and adapting the policy to unseen but similar environments is
still a significant challenge. Recently, the language-conditioned policy is
proposed to facilitate policy transfer through learning the joint
representation of observation and text that catches the compact and invariant
information across environments. Existing studies of language-conditioned RL
methods often learn the joint representation as a simple latent layer for the
given instances (episode-specific observation and text), which inevitably
includes noisy or irrelevant information and cause spurious correlations that
are dependent on instances, thus hurting generalization performance and
training efficiency. To address this issue, we propose a conceptual
reinforcement learning (CRL) framework to learn the concept-like joint
representation for language-conditioned policy. The key insight is that
concepts are compact and invariant representations in human cognition through
extracting similarities from numerous instances in real-world. In CRL, we
propose a multi-level attention encoder and two mutual information constraints
for learning compact and invariant concepts. Verified in two challenging
environments, RTFM and Messenger, CRL significantly improves the training
efficiency (up to 70%) and generalization ability (up to 30%) to the new
environment dynamics.
Related papers
- Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents [6.402396836189286]
We present a novel contrastive prompt ensemble (ConPE) framework for embodied reinforcement learning.
We devise a guided-attention-based ensemble approach with multiple visual prompts on the vision-language model to construct robust state representations.
In experiments, we show that ConPE outperforms other state-of-the-art algorithms for several embodied agent tasks.
arXiv Detail & Related papers (2024-12-16T06:53:00Z) - Inferring Behavior-Specific Context Improves Zero-Shot Generalization in Reinforcement Learning [4.902544998453533]
We argue that understanding and utilizing contextual cues, such as the gravity level of the environment, is critical for robust generalization.
Our algorithm demonstrates improved generalization on various simulated domains, outperforming prior context-learning techniques in zero-shot settings.
arXiv Detail & Related papers (2024-04-15T07:31:48Z) - Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning [48.79569442193824]
We show that COMRL algorithms are essentially optimizing the same mutual information objective between the task variable $M$ and its latent representation $Z$ by implementing various approximate bounds.
As demonstrations, we propose a supervised and a self-supervised implementation of $I(Z; M)$, and empirically show that the corresponding optimization algorithms exhibit remarkable generalization across a broad spectrum of RL benchmarks.
This work lays the information theoretic foundation for COMRL methods, leading to a better understanding of task representation learning in the context of reinforcement learning.
arXiv Detail & Related papers (2024-02-04T09:58:42Z) - Invariant Causal Imitation Learning for Generalizable Policies [87.51882102248395]
We propose Invariant Causal Learning (ICIL) to learn an imitation policy.
ICIL learns a representation of causal features that is disentangled from the specific representations of noise variables.
We show that ICIL is effective in learning imitation policies capable of generalizing to unseen environments.
arXiv Detail & Related papers (2023-11-02T16:52:36Z) - On the Role of Emergent Communication for Social Learning in Multi-Agent
Reinforcement Learning [0.0]
Social learning uses cues from experts to align heterogeneous policies, reduce sample complexity, and solve partially observable tasks.
This paper proposes an unsupervised method based on the information bottleneck to capture both referential complexity and task-specific utility.
arXiv Detail & Related papers (2023-02-28T03:23:27Z) - Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning.
Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z) - Contextualize Me -- The Case for Context in Reinforcement Learning [49.794253971446416]
Contextual Reinforcement Learning (cRL) provides a framework to model such changes in a principled manner.
We show how cRL contributes to improving zero-shot generalization in RL through meaningful benchmarks and structured reasoning about generalization tasks.
arXiv Detail & Related papers (2022-02-09T15:01:59Z) - CARL: A Benchmark for Contextual and Adaptive Reinforcement Learning [45.52724876199729]
We present CARL, a collection of well-known RL environments extended to contextual RL problems.
We provide first evidence that disentangling representation learning of the states from the policy learning with the context facilitates better generalization.
arXiv Detail & Related papers (2021-10-05T15:04:01Z) - Which Mutual-Information Representation Learning Objectives are
Sufficient for Control? [80.2534918595143]
Mutual information provides an appealing formalism for learning representations of data.
This paper formalizes the sufficiency of a state representation for learning and representing the optimal policy.
Surprisingly, we find that two of these objectives can yield insufficient representations given mild and common assumptions on the structure of the MDP.
arXiv Detail & Related papers (2021-06-14T10:12:34Z) - Invariant Causal Prediction for Block MDPs [106.63346115341862]
Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges.
We propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting.
arXiv Detail & Related papers (2020-03-12T21:03:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.