Provable Benefit of Multitask Representation Learning in Reinforcement
Learning
- URL: http://arxiv.org/abs/2206.05900v1
- Date: Mon, 13 Jun 2022 04:29:02 GMT
- Title: Provable Benefit of Multitask Representation Learning in Reinforcement
Learning
- Authors: Yuan Cheng, Songtao Feng, Jing Yang, Hong Zhang, Yingbin Liang
- Abstract summary: This paper theoretically characterizes the benefit of representation learning under the low-rank Markov decision process (MDP) model.
To the best of our knowledge, this is the first theoretical study that characterizes the benefit of representation learning in exploration-based reward-free multitask reinforcement learning.
- Score: 46.11628795660159
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As representation learning becomes a powerful technique to reduce sample
complexity in reinforcement learning (RL) in practice, theoretical
understanding of its advantage is still limited. In this paper, we
theoretically characterize the benefit of representation learning under the
low-rank Markov decision process (MDP) model. We first study multitask low-rank
RL (as upstream training), where all tasks share a common representation, and
propose a new multitask reward-free algorithm called REFUEL. REFUEL learns both
the transition kernel and the near-optimal policy for each task, and outputs a
well-learned representation for downstream tasks. Our result demonstrates that
multitask representation learning is provably more sample-efficient than
learning each task individually, as long as the total number of tasks is above
a certain threshold. We then study the downstream RL in both online and offline
settings, where the agent is assigned with a new task sharing the same
representation as the upstream tasks. For both online and offline settings, we
develop a sample-efficient algorithm, and show that it finds a near-optimal
policy with the suboptimality gap bounded by the sum of the estimation error of
the learned representation in upstream and a vanishing term as the number of
downstream samples becomes large. Our downstream results of online and offline
RL further capture the benefit of employing the learned representation from
upstream as opposed to learning the representation of the low-rank model
directly. To the best of our knowledge, this is the first theoretical study
that characterizes the benefit of representation learning in exploration-based
reward-free multitask RL for both upstream and downstream tasks.
Related papers
- Offline Multitask Representation Learning for Reinforcement Learning [86.26066704016056]
We study offline multitask representation learning in reinforcement learning (RL)
We propose a new algorithm called MORL for offline multitask representation learning.
Our theoretical results demonstrate the benefits of using the learned representation from the upstream offline task instead of directly learning the representation of the low-rank model.
arXiv Detail & Related papers (2024-03-18T08:50:30Z) - Provable Benefits of Multi-task RL under Non-Markovian Decision Making
Processes [56.714690083118406]
In multi-task reinforcement learning (RL) under Markov decision processes (MDPs), the presence of shared latent structures has been shown to yield significant benefits to the sample efficiency compared to single-task RL.
We investigate whether such a benefit can extend to more general sequential decision making problems, such as partially observable MDPs (POMDPs) and more general predictive state representations (PSRs)
We propose a provably efficient algorithm UMT-PSR for finding near-optimal policies for all PSRs, and demonstrate that the advantage of multi-task learning manifests if the joint model class of PSR
arXiv Detail & Related papers (2023-10-20T14:50:28Z) - Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning [92.18524491615548]
Contrastive self-supervised learning has been successfully integrated into the practice of (deep) reinforcement learning (RL)
We study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions.
Under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.
arXiv Detail & Related papers (2022-07-29T17:29:08Z) - Provable Benefits of Representational Transfer in Reinforcement Learning [59.712501044999875]
We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation.
We show that given generative access to source tasks, we can discover a representation, using which subsequent linear RL techniques quickly converge to a near-optimal policy.
arXiv Detail & Related papers (2022-05-29T04:31:29Z) - Provable and Efficient Continual Representation Learning [40.78975699391065]
In continual learning (CL), the goal is to design models that can learn a sequence of tasks without catastrophic forgetting.
We study the problem of continual representation learning where we learn an evolving representation as new tasks arrive.
We show that CL benefits if the initial tasks have large sample size and high "representation diversity"
arXiv Detail & Related papers (2022-03-03T21:23:08Z) - Active Multi-Task Representation Learning [50.13453053304159]
We give the first formal study on resource task sampling by leveraging the techniques from active learning.
We propose an algorithm that iteratively estimates the relevance of each source task to the target task and samples from each source task based on the estimated relevance.
arXiv Detail & Related papers (2022-02-02T08:23:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.