Multi-task Representation Learning with Stochastic Linear Bandits
- URL: http://arxiv.org/abs/2202.10066v2
- Date: Tue, 15 Aug 2023 09:12:03 GMT
- Title: Multi-task Representation Learning with Stochastic Linear Bandits
- Authors: Leonardo Cella, Karim Lounici, Gr\'egoire Pacreau, Massimiliano Pontil
- Abstract summary: We study the problem of transfer-learning in the setting of linear bandit tasks.
We consider that a low dimensional linear representation is shared across the tasks, and study the benefit of learning this representation in the multi-task learning setting.
- Score: 29.8208189270894
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of transfer-learning in the setting of stochastic linear
bandit tasks. We consider that a low dimensional linear representation is
shared across the tasks, and study the benefit of learning this representation
in the multi-task learning setting. Following recent results to design
stochastic bandit policies, we propose an efficient greedy policy based on
trace norm regularization. It implicitly learns a low dimensional
representation by encouraging the matrix formed by the task regression vectors
to be of low rank. Unlike previous work in the literature, our policy does not
need to know the rank of the underlying matrix. We derive an upper bound on the
multi-task regret of our policy, which is, up to logarithmic factors, of order
$\sqrt{NdT(T+d)r}$, where $T$ is the number of tasks, $r$ the rank, $d$ the
number of variables and $N$ the number of rounds per task. We show the benefit
of our strategy compared to the baseline $Td\sqrt{N}$ obtained by solving each
task independently. We also provide a lower bound to the multi-task regret.
Finally, we corroborate our theoretical findings with preliminary experiments
on synthetic data.
Related papers
- Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks.
Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z) - Multi-Task Imitation Learning for Linear Dynamical Systems [50.124394757116605]
We study representation learning for efficient imitation learning over linear systems.
We find that the imitation gap over trajectories generated by the learned target policy is bounded by $tildeOleft( frack n_xHN_mathrmshared + frack n_uN_mathrmtargetright)$.
arXiv Detail & Related papers (2022-12-01T00:14:35Z) - Joint Representation Training in Sequential Tasks with Shared Structure [40.1056491921582]
We introduce the Shared-MatrixRL algorithm for the setting of Multitask MatrixRL.
We show the regret on the the $P$ tasks can be improved from $O(PHdsqrtNH)$ to $O((HdsqrtrP + HPsqrtrd)sqrtNH)$ over $N$ episodes of horizon $H$.
arXiv Detail & Related papers (2022-06-24T18:10:00Z) - Provably Efficient Lifelong Reinforcement Learning with Linear Function
Approximation [41.460894569204065]
We study lifelong reinforcement learning (RL) in a regret setting of linear contextual Markov decision process (MDP)
We propose an algorithm, called UCB Lifelong Value Distillation (UCBlvd), that provably achieves sublinear regret for any sequence of tasks.
arXiv Detail & Related papers (2022-06-01T06:53:28Z) - Meta Representation Learning with Contextual Linear Bandits [34.77618818693938]
We investigate meta-learning in the setting of linear bandit tasks.
We show that if the learned representation estimates well the unknown one, then the downstream task can be efficiently learned.
arXiv Detail & Related papers (2022-05-30T13:43:53Z) - Nearly Minimax Algorithms for Linear Bandits with Shared Representation [86.79657561369397]
We consider the setting where we play $M$ linear bandits with dimension $d$, each for $T$ rounds, and these $M$ bandit tasks share a common $k(ll d)$ dimensional linear representation.
We come up with novel algorithms that achieve $widetildeOleft(dsqrtkMT + kMsqrtTright)$ regret bounds, which matches the known minimax regret lower bound up to logarithmic factors.
arXiv Detail & Related papers (2022-03-29T15:27:13Z) - New Tight Relaxations of Rank Minimization for Multi-Task Learning [161.23314844751556]
We propose two novel multi-task learning formulations based on two regularization terms.
We show that our methods can correctly recover the low-rank structure shared across tasks, and outperform related multi-task learning methods.
arXiv Detail & Related papers (2021-12-09T07:29:57Z) - Sample Efficient Linear Meta-Learning by Alternating Minimization [74.40553081646995]
We study a simple alternating minimization method (MLLAM) which alternately learns the low-dimensional subspace and the regressors.
We show that for a constant subspace dimension MLLAM obtains nearly-optimal estimation error, despite requiring only $Omega(log d)$ samples per task.
We propose a novel task subset selection scheme that ensures the same strong statistical guarantee as MLLAM.
arXiv Detail & Related papers (2021-05-18T06:46:48Z) - Meta-learning with Stochastic Linear Bandits [120.43000970418939]
We consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector.
We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
arXiv Detail & Related papers (2020-05-18T08:41:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.