Multi-task Representation Learning for Pure Exploration in Linear
Bandits
- URL: http://arxiv.org/abs/2302.04441v2
- Date: Tue, 30 May 2023 04:13:48 GMT
- Title: Multi-task Representation Learning for Pure Exploration in Linear
Bandits
- Authors: Yihan Du, Longbo Huang, Wen Sun
- Abstract summary: We study multi-task representation learning for best arm identification in linear bandits (RepBAI-LB) and best policy identification in contextual linear bandits (RepBPI-CLB)
In these two problems, all tasks share a common low-dimensional linear representation, and our goal is to leverage this feature to accelerate the best arm (policy) identification process for all tasks.
We show that by learning the common representation among tasks, our sample complexity is significantly better than that of the native approach which solves tasks independently.
- Score: 34.67303292713379
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the recent success of representation learning in sequential decision
making, the study of the pure exploration scenario (i.e., identify the best
option and minimize the sample complexity) is still limited. In this paper, we
study multi-task representation learning for best arm identification in linear
bandits (RepBAI-LB) and best policy identification in contextual linear bandits
(RepBPI-CLB), two popular pure exploration settings with wide applications,
e.g., clinical trials and web content optimization. In these two problems, all
tasks share a common low-dimensional linear representation, and our goal is to
leverage this feature to accelerate the best arm (policy) identification
process for all tasks. For these problems, we design computationally and sample
efficient algorithms DouExpDes and C-DouExpDes, which perform double
experimental designs to plan optimal sample allocations for learning the global
representation. We show that by learning the common representation among tasks,
our sample complexity is significantly better than that of the native approach
which solves tasks independently. To the best of our knowledge, this is the
first work to demonstrate the benefits of representation learning for
multi-task pure exploration.
Related papers
- Sample Efficient Myopic Exploration Through Multitask Reinforcement
Learning with Diverse Tasks [53.44714413181162]
This paper shows that when an agent is trained on a sufficiently diverse set of tasks, a generic policy-sharing algorithm with myopic exploration design can be sample-efficient.
To the best of our knowledge, this is the first theoretical demonstration of the "exploration benefits" of MTRL.
arXiv Detail & Related papers (2024-03-03T22:57:44Z) - Multi-task Representation Learning for Pure Exploration in Bilinear
Bandits [13.773838574776338]
We study multi-task representation learning for the problem of pure exploration in bilinear bandits.
In bilinear bandits, an action takes the form of a pair of arms from two different entity types.
arXiv Detail & Related papers (2023-11-01T06:30:45Z) - Provable Benefits of Multi-task RL under Non-Markovian Decision Making
Processes [56.714690083118406]
In multi-task reinforcement learning (RL) under Markov decision processes (MDPs), the presence of shared latent structures has been shown to yield significant benefits to the sample efficiency compared to single-task RL.
We investigate whether such a benefit can extend to more general sequential decision making problems, such as partially observable MDPs (POMDPs) and more general predictive state representations (PSRs)
We propose a provably efficient algorithm UMT-PSR for finding near-optimal policies for all PSRs, and demonstrate that the advantage of multi-task learning manifests if the joint model class of PSR
arXiv Detail & Related papers (2023-10-20T14:50:28Z) - Active Representation Learning for General Task Space with Applications
in Robotics [44.36398212117328]
We propose an algorithmic framework for textitactive representation learning, where the learner optimally chooses which source tasks to sample from.
We provide several instantiations under this framework, from bilinear and feature-based nonlinear to general nonlinear cases.
Our algorithms outperform baselines by $20%-70%$ on average.
arXiv Detail & Related papers (2023-06-15T08:27:50Z) - Provable Benefit of Multitask Representation Learning in Reinforcement
Learning [46.11628795660159]
This paper theoretically characterizes the benefit of representation learning under the low-rank Markov decision process (MDP) model.
To the best of our knowledge, this is the first theoretical study that characterizes the benefit of representation learning in exploration-based reward-free multitask reinforcement learning.
arXiv Detail & Related papers (2022-06-13T04:29:02Z) - Provable Benefits of Representational Transfer in Reinforcement Learning [59.712501044999875]
We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation.
We show that given generative access to source tasks, we can discover a representation, using which subsequent linear RL techniques quickly converge to a near-optimal policy.
arXiv Detail & Related papers (2022-05-29T04:31:29Z) - Active Multi-Task Representation Learning [50.13453053304159]
We give the first formal study on resource task sampling by leveraging the techniques from active learning.
We propose an algorithm that iteratively estimates the relevance of each source task to the target task and samples from each source task based on the estimated relevance.
arXiv Detail & Related papers (2022-02-02T08:23:24Z) - Reinforcement Learning with Prototypical Representations [114.35801511501639]
Proto-RL is a self-supervised framework that ties representation learning with exploration through prototypical representations.
These prototypes simultaneously serve as a summarization of the exploratory experience of an agent as well as a basis for representing observations.
This enables state-of-the-art downstream policy learning on a set of difficult continuous control tasks.
arXiv Detail & Related papers (2021-02-22T18:56:34Z) - Sequential Transfer in Reinforcement Learning with a Generative Model [48.40219742217783]
We show how to reduce the sample complexity for learning new tasks by transferring knowledge from previously-solved ones.
We derive PAC bounds on its sample complexity which clearly demonstrate the benefits of using this kind of prior knowledge.
We empirically verify our theoretical findings in simple simulated domains.
arXiv Detail & Related papers (2020-07-01T19:53:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.