Related papers: Entropy Regularized Task Representation Learning for Offline Meta-Reinforcement Learning

Entropy Regularized Task Representation Learning for Offline Meta-Reinforcement Learning

URL: http://arxiv.org/abs/2412.14834v2
Date: Wed, 22 Jan 2025 13:07:07 GMT
Title: Entropy Regularized Task Representation Learning for Offline Meta-Reinforcement Learning
Authors: Mohammadreza Nakhaei, Aidan Scannell, Joni Pajarinen,
Abstract summary: offline meta-reinforcement learning aims to equip agents with the ability to rapidly adapt to new tasks by training on data from a set of different tasks.<n> Context-based approaches utilize a history of state-action-reward transitions to infer representations of the current task, and then condition the agent, i.e., the policy and value function, on the task representations.<n>Unfortunately, context-based approaches suffer from distribution mismatch, as the context in the offline data does not match the context at test time.
Score: 12.443661471796595
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Offline meta-reinforcement learning aims to equip agents with the ability to rapidly adapt to new tasks by training on data from a set of different tasks. Context-based approaches utilize a history of state-action-reward transitions -- referred to as the context -- to infer representations of the current task, and then condition the agent, i.e., the policy and value function, on the task representations. Intuitively, the better the task representations capture the underlying tasks, the better the agent can generalize to new tasks. Unfortunately, context-based approaches suffer from distribution mismatch, as the context in the offline data does not match the context at test time, limiting their ability to generalize to the test tasks. This leads to the task representations overfitting to the offline training data. Intuitively, the task representations should be independent of the behavior policy used to collect the offline data. To address this issue, we approximately minimize the mutual information between the distribution over the task representations and behavior policy by maximizing the entropy of behavior policy conditioned on the task representations. We validate our approach in MuJoCo environments, showing that compared to baselines, our task representations more faithfully represent the underlying tasks, leading to outperforming prior methods in both in-distribution and out-of-distribution tasks.

Related papers

Contextual Latent World Models for Offline Meta Reinforcement Learning [17.917947576971816]
We introduce contextual latent world models, which condition latent world models on inferred task representations and train them jointly with the context encoder.<n>This enforces task-conditioned temporal consistency, yielding task representations that capture task-dependent dynamics.<n>Our method learns more expressive task representations and significantly improves generalization to unseen tasks across MuJoCo, Contextual-DeepMind Control, and Meta-World benchmarks.
arXiv Detail & Related papers (2026-03-03T12:45:20Z)
Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning. We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads. We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z)
Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks [4.374837991804085]
Task-Aware Virtual Training (TAVT) is a novel algorithm that captures task characteristics for both training and out-of-distribution (OOD) scenarios. Numerical results demonstrate that TAVT significantly enhances generalization to OOD tasks across various MuJoCo and MetaWorld environments.
arXiv Detail & Related papers (2025-02-05T02:31:50Z)
Disentangling Policy from Offline Task Representation Learning via Adversarial Data Augmentation [29.49883684368039]
offline meta-reinforcement learning (OMRL) proficiently allows an agent to tackle novel tasks while relying on a static dataset. We introduce a novel algorithm to disentangle the impact of behavior policy from task representation learning.
arXiv Detail & Related papers (2024-03-12T02:38:36Z)
Data-CUBE: Data Curriculum for Instruction-based Sentence Representation Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training. In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk. In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z)
Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks [101.40633115037983]
Instruction tuning (IT) achieves impressive zero-shot generalization results by training large language models (LLMs) on a massive amount of diverse tasks with instructions. How to select new tasks to improve the performance and generalizability of IT models remains an open question. We propose active instruction tuning based on prompt uncertainty, a novel framework to identify informative tasks, and then actively tune the models on the selected tasks.
arXiv Detail & Related papers (2023-11-01T04:40:05Z)
Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning [23.45043290237396]
MoSS is a context-based Meta-reinforcement learning algorithm based on Self-Supervised task representation learning. On MuJoCo and Meta-World benchmarks, MoSS outperforms prior in terms of performance, sample efficiency (3-50x faster), adaptation efficiency, and generalization.
arXiv Detail & Related papers (2023-04-29T15:46:19Z)
On Context Distribution Shift in Task Representation Learning for Offline Meta RL [7.8317653074640186]
We focus on context-based OMRL, specifically on the challenge of learning task representation for OMRL. To overcome this problem, we present a hard-sampling-based strategy to train a robust task context encoder.
arXiv Detail & Related papers (2023-04-01T16:21:55Z)
Learning Action Translator for Meta Reinforcement Learning on Sparse-Reward Tasks [56.63855534940827]
This work introduces a novel objective function to learn an action translator among training tasks. We theoretically verify that the value of the transferred policy with the action translator can be close to the value of the source policy. We propose to combine the action translator with context-based meta-RL algorithms for better data collection and more efficient exploration during meta-training.
arXiv Detail & Related papers (2022-07-19T04:58:06Z)
Provable Benefits of Representational Transfer in Reinforcement Learning [59.712501044999875]
We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation. We show that given generative access to source tasks, we can discover a representation, using which subsequent linear RL techniques quickly converge to a near-optimal policy.
arXiv Detail & Related papers (2022-05-29T04:31:29Z)
Conditional Meta-Learning of Linear Representations [57.90025697492041]
Standard meta-learning for representation learning aims to find a common representation to be shared across multiple tasks. In this work we overcome this issue by inferring a conditioning function, mapping the tasks' side information into a representation tailored to the task at hand. We propose a meta-algorithm capable of leveraging this advantage in practice.
arXiv Detail & Related papers (2021-03-30T12:02:14Z)
Learning Task-oriented Disentangled Representations for Unsupervised Domain Adaptation [165.61511788237485]
Unsupervised domain adaptation (UDA) aims to address the domain-shift problem between a labeled source domain and an unlabeled target domain. We propose a dynamic task-oriented disentangling network (DTDN) to learn disentangled representations in an end-to-end fashion for UDA.
arXiv Detail & Related papers (2020-07-27T01:21:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.