Meta-learning with Stochastic Linear Bandits
- URL: http://arxiv.org/abs/2005.08531v1
- Date: Mon, 18 May 2020 08:41:39 GMT
- Title: Meta-learning with Stochastic Linear Bandits
- Authors: Leonardo Cella, Alessandro Lazaric, Massimiliano Pontil
- Abstract summary: We consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector.
We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
- Score: 120.43000970418939
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate meta-learning procedures in the setting of stochastic linear
bandits tasks. The goal is to select a learning algorithm which works well on
average over a class of bandits tasks, that are sampled from a
task-distribution. Inspired by recent work on learning-to-learn linear
regression, we consider a class of bandit algorithms that implement a
regularized version of the well-known OFUL algorithm, where the regularization
is a square euclidean distance to a bias vector. We first study the benefit of
the biased OFUL algorithm in terms of regret minimization. We then propose two
strategies to estimate the bias within the learning-to-learn setting. We show
both theoretically and experimentally, that when the number of tasks grows and
the variance of the task-distribution is small, our strategies have a
significant advantage over learning the tasks in isolation.
Related papers
- Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits [15.342585350280535]
We study how representation learning can improve the learning efficiency of contextual bandit problems.
We present a new algorithm based on alternating projected gradient descent (GD) and minimization estimator.
arXiv Detail & Related papers (2024-10-02T22:30:29Z) - Learning the Positions in CountSketch [49.57951567374372]
We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem.
In this work, we propose the first learning-based algorithms that also optimize the locations of the non-zero entries.
arXiv Detail & Related papers (2023-06-11T07:28:35Z) - Towards Diverse Evaluation of Class Incremental Learning: A Representation Learning Perspective [67.45111837188685]
Class incremental learning (CIL) algorithms aim to continually learn new object classes from incrementally arriving data.
We experimentally analyze neural network models trained by CIL algorithms using various evaluation protocols in representation learning.
arXiv Detail & Related papers (2022-06-16T11:44:11Z) - Meta-Learning Adversarial Bandits [49.094361442409785]
We study online learning with bandit feedback across multiple tasks, with the goal of improving average performance across tasks if they are similar according to some natural task-similarity measure.
As the first to target the adversarial setting, we design a meta-algorithm that setting-specific guarantees for two important cases: multi-armed bandits (MAB) and bandit optimization (BLO)
Our guarantees rely on proving that unregularized follow-the-leader combined with multiplicative weights is enough to online learn a non-smooth and non-B sequence.
arXiv Detail & Related papers (2022-05-27T17:40:32Z) - PAC-Bayesian Lifelong Learning For Multi-Armed Bandits [38.76324445090305]
We present a PAC-Bayesian analysis of lifelong learning.
We consider the case when each learning task is a multi-armed bandit problem.
We propose lifelong learning algorithms that use our new bounds as learning objectives.
arXiv Detail & Related papers (2022-03-07T11:23:12Z) - Multi-task Representation Learning with Stochastic Linear Bandits [29.8208189270894]
We study the problem of transfer-learning in the setting of linear bandit tasks.
We consider that a low dimensional linear representation is shared across the tasks, and study the benefit of learning this representation in the multi-task learning setting.
arXiv Detail & Related papers (2022-02-21T09:26:34Z) - New Tight Relaxations of Rank Minimization for Multi-Task Learning [161.23314844751556]
We propose two novel multi-task learning formulations based on two regularization terms.
We show that our methods can correctly recover the low-rank structure shared across tasks, and outperform related multi-task learning methods.
arXiv Detail & Related papers (2021-12-09T07:29:57Z) - Learning-to-learn non-convex piecewise-Lipschitz functions [44.6133187924678]
We analyze the meta-learning of the algorithms for piecewise-Lipschitz functions, a non-task with applications to both machine learning algorithms.
We propose a practical meta-learning procedure that learns both the step-size of the algorithm from multiple online learning tasks.
arXiv Detail & Related papers (2021-08-19T16:22:48Z) - Adaptive Task Sampling for Meta-Learning [79.61146834134459]
Key idea of meta-learning for few-shot classification is to mimic the few-shot situations faced at test time.
We propose an adaptive task sampling method to improve the generalization performance.
arXiv Detail & Related papers (2020-07-17T03:15:53Z) - Estimates on Learning Rates for Multi-Penalty Distribution Regression [5.999239529678357]
We study a multi-penalty regularization algorithm for distribution regression under the framework of learning theory.
We embed the distributions to reproducing a kernel Hilbert space $mathcalH_K$ associated with Mercer kernel $K$ via mean embedding technique.
The work also derives learning rates for distribution regression in the nonstandard setting $f_rhonotinmathcalH_K$, which is not explored in existing literature.
arXiv Detail & Related papers (2020-06-16T09:31:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.