Offline Multi-task Transfer RL with Representational Penalization
- URL: http://arxiv.org/abs/2402.12570v1
- Date: Mon, 19 Feb 2024 21:52:44 GMT
- Title: Offline Multi-task Transfer RL with Representational Penalization
- Authors: Avinandan Bose, Simon Shaolei Du, Maryam Fazel
- Abstract summary: We study the problem of representation transfer in offline Reinforcement Learning (RL)
We propose an algorithm to compute pointwise uncertainty measures for the learnt representation.
- Score: 26.114893629771736
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of representation transfer in offline Reinforcement
Learning (RL), where a learner has access to episodic data from a number of
source tasks collected a priori, and aims to learn a shared representation to
be used in finding a good policy for a target task. Unlike in online RL where
the agent interacts with the environment while learning a policy, in the
offline setting there cannot be such interactions in either the source tasks or
the target task; thus multi-task offline RL can suffer from incomplete
coverage.
We propose an algorithm to compute pointwise uncertainty measures for the
learnt representation, and establish a data-dependent upper bound for the
suboptimality of the learnt policy for the target task. Our algorithm leverages
the collective exploration done by source tasks to mitigate poor coverage at
some points by a few tasks, thus overcoming the limitation of needing uniformly
good coverage for a meaningful transfer by existing offline algorithms. We
complement our theoretical results with empirical evaluation on a
rich-observation MDP which requires many samples for complete coverage. Our
findings illustrate the benefits of penalizing and quantifying the uncertainty
in the learnt representation.
Related papers
- Natural Policy Gradient and Actor Critic Methods for Constrained Multi-Task Reinforcement Learning [13.908826484332282]
Multi-task reinforcement learning (RL) aims to find a single policy that effectively solves multiple tasks at the same time.
This paper presents a constrained formulation for multi-task RL where the goal is to maximize the average performance of the policy across tasks subject to bounds on the performance in each task.
arXiv Detail & Related papers (2024-05-03T19:43:30Z) - Offline Multitask Representation Learning for Reinforcement Learning [86.26066704016056]
We study offline multitask representation learning in reinforcement learning (RL)
We propose a new algorithm called MORL for offline multitask representation learning.
Our theoretical results demonstrate the benefits of using the learned representation from the upstream offline task instead of directly learning the representation of the low-rank model.
arXiv Detail & Related papers (2024-03-18T08:50:30Z) - Disentangling Policy from Offline Task Representation Learning via
Adversarial Data Augmentation [29.49883684368039]
offline meta-reinforcement learning (OMRL) proficiently allows an agent to tackle novel tasks while relying on a static dataset.
We introduce a novel algorithm to disentangle the impact of behavior policy from task representation learning.
arXiv Detail & Related papers (2024-03-12T02:38:36Z) - Generalizable Task Representation Learning for Offline
Meta-Reinforcement Learning with Data Limitations [22.23114883485924]
We propose a novel algorithm called GENTLE for learning generalizable task representations in the face of data limitations.
GENTLE employs Task Auto-Encoder(TAE), which is an encoder-decoder architecture to extract the characteristics of the tasks.
To alleviate the effect of limited behavior diversity, we construct pseudo-transitions to align the data distribution used to train TAE with the data distribution encountered during testing.
arXiv Detail & Related papers (2023-12-26T07:02:12Z) - Offline Reinforcement Learning with Instrumental Variables in Confounded
Markov Decision Processes [93.61202366677526]
We study the offline reinforcement learning (RL) in the face of unmeasured confounders.
We propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy.
arXiv Detail & Related papers (2022-09-18T22:03:55Z) - Learning Action Translator for Meta Reinforcement Learning on
Sparse-Reward Tasks [56.63855534940827]
This work introduces a novel objective function to learn an action translator among training tasks.
We theoretically verify that the value of the transferred policy with the action translator can be close to the value of the source policy.
We propose to combine the action translator with context-based meta-RL algorithms for better data collection and more efficient exploration during meta-training.
arXiv Detail & Related papers (2022-07-19T04:58:06Z) - Provable Benefit of Multitask Representation Learning in Reinforcement
Learning [46.11628795660159]
This paper theoretically characterizes the benefit of representation learning under the low-rank Markov decision process (MDP) model.
To the best of our knowledge, this is the first theoretical study that characterizes the benefit of representation learning in exploration-based reward-free multitask reinforcement learning.
arXiv Detail & Related papers (2022-06-13T04:29:02Z) - Provable Benefits of Representational Transfer in Reinforcement Learning [59.712501044999875]
We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation.
We show that given generative access to source tasks, we can discover a representation, using which subsequent linear RL techniques quickly converge to a near-optimal policy.
arXiv Detail & Related papers (2022-05-29T04:31:29Z) - Active Multi-Task Representation Learning [50.13453053304159]
We give the first formal study on resource task sampling by leveraging the techniques from active learning.
We propose an algorithm that iteratively estimates the relevance of each source task to the target task and samples from each source task based on the estimated relevance.
arXiv Detail & Related papers (2022-02-02T08:23:24Z) - Conservative Data Sharing for Multi-Task Offline Reinforcement Learning [119.85598717477016]
We argue that a natural use case of offline RL is in settings where we can pool large amounts of data collected in various scenarios for solving different tasks.
We develop a simple technique for data-sharing in multi-task offline RL that routes data based on the improvement over the task-specific data.
arXiv Detail & Related papers (2021-09-16T17:34:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.