Disentangling Policy from Offline Task Representation Learning via
Adversarial Data Augmentation
- URL: http://arxiv.org/abs/2403.07261v1
- Date: Tue, 12 Mar 2024 02:38:36 GMT
- Title: Disentangling Policy from Offline Task Representation Learning via
Adversarial Data Augmentation
- Authors: Chengxing Jia, Fuxiang Zhang, Yi-Chen Li, Chen-Xiao Gao, Xu-Hui Liu,
Lei Yuan, Zongzhang Zhang, Yang Yu
- Abstract summary: offline meta-reinforcement learning (OMRL) proficiently allows an agent to tackle novel tasks while relying on a static dataset.
We introduce a novel algorithm to disentangle the impact of behavior policy from task representation learning.
- Score: 29.49883684368039
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline meta-reinforcement learning (OMRL) proficiently allows an agent to
tackle novel tasks while solely relying on a static dataset. For precise and
efficient task identification, existing OMRL research suggests learning
separate task representations that be incorporated with policy input, thus
forming a context-based meta-policy. A major approach to train task
representations is to adopt contrastive learning using multi-task offline data.
The dataset typically encompasses interactions from various policies (i.e., the
behavior policies), thus providing a plethora of contextual information
regarding different tasks. Nonetheless, amassing data from a substantial number
of policies is not only impractical but also often unattainable in realistic
settings. Instead, we resort to a more constrained yet practical scenario,
where multi-task data collection occurs with a limited number of policies. We
observed that learned task representations from previous OMRL methods tend to
correlate spuriously with the behavior policy instead of reflecting the
essential characteristics of the task, resulting in unfavorable
out-of-distribution generalization. To alleviate this issue, we introduce a
novel algorithm to disentangle the impact of behavior policy from task
representation learning through a process called adversarial data augmentation.
Specifically, the objective of adversarial data augmentation is not merely to
generate data analogous to offline data distribution; instead, it aims to
create adversarial examples designed to confound learned task representations
and lead to incorrect task identification. Our experiments show that learning
from such adversarial samples significantly enhances the robustness and
effectiveness of the task identification process and realizes satisfactory
out-of-distribution generalization.
Related papers
- Offline Reinforcement Learning from Datasets with Structured Non-Stationarity [50.35634234137108]
Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy.
We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode.
We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation.
arXiv Detail & Related papers (2024-05-23T02:41:36Z) - Offline Multi-task Transfer RL with Representational Penalization [26.114893629771736]
We study the problem of representation transfer in offline Reinforcement Learning (RL)
We propose an algorithm to compute pointwise uncertainty measures for the learnt representation.
arXiv Detail & Related papers (2024-02-19T21:52:44Z) - Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - Distribution Matching for Multi-Task Learning of Classification Tasks: a
Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space.
We show that MTL can be successful with classification tasks with little, or non-overlapping annotations.
We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z) - Generalizable Task Representation Learning for Offline
Meta-Reinforcement Learning with Data Limitations [22.23114883485924]
We propose a novel algorithm called GENTLE for learning generalizable task representations in the face of data limitations.
GENTLE employs Task Auto-Encoder(TAE), which is an encoder-decoder architecture to extract the characteristics of the tasks.
To alleviate the effect of limited behavior diversity, we construct pseudo-transitions to align the data distribution used to train TAE with the data distribution encountered during testing.
arXiv Detail & Related papers (2023-12-26T07:02:12Z) - Meta-Reinforcement Learning Based on Self-Supervised Task Representation
Learning [23.45043290237396]
MoSS is a context-based Meta-reinforcement learning algorithm based on Self-Supervised task representation learning.
On MuJoCo and Meta-World benchmarks, MoSS outperforms prior in terms of performance, sample efficiency (3-50x faster), adaptation efficiency, and generalization.
arXiv Detail & Related papers (2023-04-29T15:46:19Z) - Leveraging sparse and shared feature activations for disentangled
representation learning [112.22699167017471]
We propose to leverage knowledge extracted from a diversified set of supervised tasks to learn a common disentangled representation.
We validate our approach on six real world distribution shift benchmarks, and different data modalities.
arXiv Detail & Related papers (2023-04-17T01:33:24Z) - Generalization with Lossy Affordances: Leveraging Broad Offline Data for
Learning Visuomotor Tasks [65.23947618404046]
We introduce a framework that acquires goal-conditioned policies for unseen temporally extended tasks via offline reinforcement learning on broad data.
When faced with a novel task goal, the framework uses an affordance model to plan a sequence of lossy representations as subgoals that decomposes the original task into easier problems.
We show that our framework can be pre-trained on large-scale datasets of robot experiences from prior work and efficiently fine-tuned for novel tasks, entirely from visual inputs without any manual reward engineering.
arXiv Detail & Related papers (2022-10-12T21:46:38Z) - Robust Task Representations for Offline Meta-Reinforcement Learning via
Contrastive Learning [21.59254848913971]
offline meta-reinforcement learning is a reinforcement learning paradigm that learns from offline data to adapt to new tasks.
We propose a contrastive learning framework for task representations that are robust to the distribution of behavior policies in training and test.
Experiments on a variety of offline meta-reinforcement learning benchmarks demonstrate the advantages of our method over prior methods.
arXiv Detail & Related papers (2022-06-21T14:46:47Z) - Learning Task-oriented Disentangled Representations for Unsupervised
Domain Adaptation [165.61511788237485]
Unsupervised domain adaptation (UDA) aims to address the domain-shift problem between a labeled source domain and an unlabeled target domain.
We propose a dynamic task-oriented disentangling network (DTDN) to learn disentangled representations in an end-to-end fashion for UDA.
arXiv Detail & Related papers (2020-07-27T01:21:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.