Meta Representation Learning with Contextual Linear Bandits
- URL: http://arxiv.org/abs/2205.15100v1
- Date: Mon, 30 May 2022 13:43:53 GMT
- Title: Meta Representation Learning with Contextual Linear Bandits
- Authors: Leonardo Cella, Karim Lounici, Massimiliano Pontil
- Abstract summary: We investigate meta-learning in the setting of linear bandit tasks.
We show that if the learned representation estimates well the unknown one, then the downstream task can be efficiently learned.
- Score: 34.77618818693938
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Meta-learning seeks to build algorithms that rapidly learn how to solve new
learning problems based on previous experience. In this paper we investigate
meta-learning in the setting of stochastic linear bandit tasks. We assume that
the tasks share a low dimensional representation, which has been partially
acquired from previous learning tasks. We aim to leverage this information in
order to learn a new downstream bandit task, which shares the same
representation. Our principal contribution is to show that if the learned
representation estimates well the unknown one, then the downstream task can be
efficiently learned by a greedy policy that we propose in this work. We derive
an upper bound on the regret of this policy, which is, up to logarithmic
factors, of order $r\sqrt{N}(1\vee \sqrt{d/T})$, where $N$ is the horizon of
the downstream task, $T$ is the number of training tasks, $d$ the ambient
dimension and $r \ll d$ the dimension of the representation. We highlight that
our strategy does not need to know $r$. We note that if $T> d$ our bound
achieves the same rate of optimal minimax bandit algorithms using the true
underlying representation. Our analysis is inspired and builds in part upon
previous work on meta-learning in the i.i.d. full information setting
\citep{tripuraneni2021provable,boursier2022trace}. As a separate contribution
we show how to relax certain assumptions in those works, thereby improving
their representation learning and risk analysis.
Related papers
- Multi-Task Imitation Learning for Linear Dynamical Systems [50.124394757116605]
We study representation learning for efficient imitation learning over linear systems.
We find that the imitation gap over trajectories generated by the learned target policy is bounded by $tildeOleft( frack n_xHN_mathrmshared + frack n_uN_mathrmtargetright)$.
arXiv Detail & Related papers (2022-12-01T00:14:35Z) - Nearly Minimax Algorithms for Linear Bandits with Shared Representation [86.79657561369397]
We consider the setting where we play $M$ linear bandits with dimension $d$, each for $T$ rounds, and these $M$ bandit tasks share a common $k(ll d)$ dimensional linear representation.
We come up with novel algorithms that achieve $widetildeOleft(dsqrtkMT + kMsqrtTright)$ regret bounds, which matches the known minimax regret lower bound up to logarithmic factors.
arXiv Detail & Related papers (2022-03-29T15:27:13Z) - Multi-task Representation Learning with Stochastic Linear Bandits [29.8208189270894]
We study the problem of transfer-learning in the setting of linear bandit tasks.
We consider that a low dimensional linear representation is shared across the tasks, and study the benefit of learning this representation in the multi-task learning setting.
arXiv Detail & Related papers (2022-02-21T09:26:34Z) - Provable Lifelong Learning of Representations [21.440845049501778]
We propose a provable lifelong learning algorithm that maintains and refines the internal feature representation.
We prove that for any desired accuracy on all tasks, the dimension of the representation remains close to that of the underlying representation.
arXiv Detail & Related papers (2021-10-27T00:41:23Z) - Randomized Exploration for Reinforcement Learning with General Value
Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm.
Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises.
We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z) - Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems.
We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z) - Meta-learning with Stochastic Linear Bandits [120.43000970418939]
We consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector.
We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
arXiv Detail & Related papers (2020-05-18T08:41:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.