Curriculum Reinforcement Learning using Optimal Transport via Gradual
Domain Adaptation
- URL: http://arxiv.org/abs/2210.10195v1
- Date: Tue, 18 Oct 2022 22:33:33 GMT
- Title: Curriculum Reinforcement Learning using Optimal Transport via Gradual
Domain Adaptation
- Authors: Peide Huang, Mengdi Xu, Jiacheng Zhu, Laixi Shi, Fei Fang, Ding Zhao
- Abstract summary: Reinforcement Learning (CRL) aims to create a sequence of tasks, starting from easy ones and gradually learning towards difficult tasks.
In this work, we focus on the idea of framing CRL as Curriculums between a source (auxiliary) and a target task distribution.
Inspired by the insights from gradual domain adaptation in semi-supervised learning, we create a natural curriculum by breaking down the potentially large task distributional shift in CRL into smaller shifts.
- Score: 46.103426976842336
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Curriculum Reinforcement Learning (CRL) aims to create a sequence of tasks,
starting from easy ones and gradually learning towards difficult tasks. In this
work, we focus on the idea of framing CRL as interpolations between a source
(auxiliary) and a target task distribution. Although existing studies have
shown the great potential of this idea, it remains unclear how to formally
quantify and generate the movement between task distributions. Inspired by the
insights from gradual domain adaptation in semi-supervised learning, we create
a natural curriculum by breaking down the potentially large task distributional
shift in CRL into smaller shifts. We propose GRADIENT, which formulates CRL as
an optimal transport problem with a tailored distance metric between tasks.
Specifically, we generate a sequence of task distributions as a geodesic
interpolation (i.e., Wasserstein barycenter) between the source and target
distributions. Different from many existing methods, our algorithm considers a
task-dependent contextual distance metric and is capable of handling
nonparametric distributions in both continuous and discrete context settings.
In addition, we theoretically show that GRADIENT enables smooth transfer
between subsequent stages in the curriculum under certain conditions. We
conduct extensive experiments in locomotion and manipulation tasks and show
that our proposed GRADIENT achieves higher performance than baselines in terms
of learning efficiency and asymptotic performance.
Related papers
- Proximal Curriculum with Task Correlations for Deep Reinforcement Learning [25.10619062353793]
We consider curriculum design in contextual multi-task settings where the agent's final performance is measured w.r.t. a target distribution over complex tasks.
We propose a novel curriculum, ProCuRL-Target, that effectively balances the need for selecting tasks that are not too difficult for the agent while progressing the agent's learning toward the target distribution via leveraging task correlations.
arXiv Detail & Related papers (2024-05-03T21:07:54Z) - Offline Multi-task Transfer RL with Representational Penalization [26.114893629771736]
We study the problem of representation transfer in offline Reinforcement Learning (RL)
We propose an algorithm to compute pointwise uncertainty measures for the learnt representation.
arXiv Detail & Related papers (2024-02-19T21:52:44Z) - On the Benefit of Optimal Transport for Curriculum Reinforcement Learning [32.59609255906321]
We focus on framing curricula ass between task distributions.
We frame the generation of a curriculum as a constrained optimal transport problem.
Benchmarks show that this way of curriculum generation can improve upon existing CRL methods.
arXiv Detail & Related papers (2023-09-25T12:31:37Z) - CLUTR: Curriculum Learning via Unsupervised Task Representation Learning [130.79246770546413]
CLUTR is a novel curriculum learning algorithm that decouples task representation and curriculum learning into a two-stage optimization.
We show CLUTR outperforms PAIRED, a principled and popular UED method, in terms of generalization and sample efficiency in the challenging CarRacing and navigation environments.
arXiv Detail & Related papers (2022-10-19T01:45:29Z) - Meta Reinforcement Learning with Finite Training Tasks -- a Density Estimation Approach [21.44737454610142]
In meta reinforcement learning (meta RL), an agent learns from a set of training tasks how to quickly solve a new task, drawn from the same task distribution.
The question we explore in this work is how many training tasks are required to guarantee approximately optimal behavior with high probability.
We propose a different approach: directly learn the task distribution, using density estimation techniques, and then train a policy on the learned task distribution.
arXiv Detail & Related papers (2022-06-21T20:32:19Z) - Provable Benefits of Representational Transfer in Reinforcement Learning [59.712501044999875]
We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation.
We show that given generative access to source tasks, we can discover a representation, using which subsequent linear RL techniques quickly converge to a near-optimal policy.
arXiv Detail & Related papers (2022-05-29T04:31:29Z) - Deep transfer learning for partial differential equations under
conditional shift with DeepONet [0.0]
We propose a novel TL framework for task-specific learning under conditional shift with a deep operator network (DeepONet)
Inspired by the conditional embedding operator theory, we measure the statistical distance between the source domain and the target feature domain.
We show that the proposed TL framework enables fast and efficient multi-task operator learning, despite significant differences between the source and target domains.
arXiv Detail & Related papers (2022-04-20T23:23:38Z) - Learning to Relate Depth and Semantics for Unsupervised Domain
Adaptation [87.1188556802942]
We present an approach for encoding visual task relationships to improve model performance in an Unsupervised Domain Adaptation (UDA) setting.
We propose a novel Cross-Task Relation Layer (CTRL), which encodes task dependencies between the semantic and depth predictions.
Furthermore, we propose an Iterative Self-Learning (ISL) training scheme, which exploits semantic pseudo-labels to provide extra supervision on the target domain.
arXiv Detail & Related papers (2021-05-17T13:42:09Z) - Dif-MAML: Decentralized Multi-Agent Meta-Learning [54.39661018886268]
We propose a cooperative multi-agent meta-learning algorithm, referred to as MAML or Dif-MAML.
We show that the proposed strategy allows a collection of agents to attain agreement at a linear rate and to converge to a stationary point of the aggregate MAML.
Simulation results illustrate the theoretical findings and the superior performance relative to the traditional non-cooperative setting.
arXiv Detail & Related papers (2020-10-06T16:51:09Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.