Online Parameter-Free Learning of Multiple Low Variance Tasks
- URL: http://arxiv.org/abs/2007.05732v1
- Date: Sat, 11 Jul 2020 09:52:53 GMT
- Title: Online Parameter-Free Learning of Multiple Low Variance Tasks
- Authors: Giulia Denevi, Dimitris Stamos, Massimiliano Pontil
- Abstract summary: We propose a method to learn a common bias vector for a growing sequence of low-variance tasks.
Our approach is presented in the non-statistical setting and can be of two variants.
Experiments confirm the effectiveness of our methods in practice.
- Score: 36.08679456245112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a method to learn a common bias vector for a growing sequence of
low-variance tasks. Unlike state-of-the-art approaches, our method does not
require tuning any hyper-parameter. Our approach is presented in the
non-statistical setting and can be of two variants. The "aggressive" one
updates the bias after each datapoint, the "lazy" one updates the bias only at
the end of each task. We derive an across-tasks regret bound for the method.
When compared to state-of-the-art approaches, the aggressive variant returns
faster rates, the lazy one recovers standard rates, but with no need of tuning
hyper-parameters. We then adapt the methods to the statistical setting: the
aggressive variant becomes a multi-task learning method, the lazy one a
meta-learning method. Experiments confirm the effectiveness of our methods in
practice.
Related papers
- Adaptive Rentention & Correction for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.
We name our approach Adaptive Retention & Correction (ARC)
ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - Hessian Aware Low-Rank Perturbation for Order-Robust Continual Learning [19.850893012601638]
Continual learning aims to learn a series of tasks sequentially without forgetting the knowledge acquired from the previous ones.
We propose the Hessian Aware Low-Rank Perturbation algorithm for continual learning.
arXiv Detail & Related papers (2023-11-26T01:44:01Z) - MEAL: Stable and Active Learning for Few-Shot Prompting [26.60924937965494]
Few-shot classification has high variance both across different sets of few shots and across different finetuning runs.
We propose novel ensembling methods and show that they substantially reduce run variability.
Second, we introduce a new active learning (AL) criterion for data selection and present the first AL-based approach specifically tailored towards prompt-based learning.
arXiv Detail & Related papers (2022-11-15T18:06:53Z) - One Step at a Time: Pros and Cons of Multi-Step Meta-Gradient
Reinforcement Learning [61.662504399411695]
We introduce a novel method mixing multiple inner steps that enjoys a more accurate and robust meta-gradient signal.
When applied to the Snake game, the mixing meta-gradient algorithm can cut the variance by a factor of 3 while achieving similar or higher performance.
arXiv Detail & Related papers (2021-10-30T08:36:52Z) - Faster Meta Update Strategy for Noise-Robust Deep Learning [62.08964100618873]
We introduce a novel Faster Meta Update Strategy (FaMUS) to replace the most expensive step in the meta gradient with a faster layer-wise approximation.
We show our method is able to save two-thirds of the training time while still maintaining the comparable or achieving even better generalization performance.
arXiv Detail & Related papers (2021-04-30T16:19:07Z) - Hyperparameter Transfer Learning with Adaptive Complexity [5.695163312473305]
We propose a new multi-task BO method that learns a set of ordered, non-linear basis functions of increasing complexity via nested drop-out and automatic relevance determination.
arXiv Detail & Related papers (2021-02-25T12:26:52Z) - Meta-learning with Stochastic Linear Bandits [120.43000970418939]
We consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector.
We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
arXiv Detail & Related papers (2020-05-18T08:41:39Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z) - Statistical Adaptive Stochastic Gradient Methods [34.859895010071234]
We propose a statistical adaptive procedure called SALSA for automatically scheduling the learning rate (step size) in gradient methods.
SALSA first uses a smoothed line-search procedure to gradually increase the learning rate, then automatically decreases the learning rate.
The method for decreasing the learning rate is based on a new statistical test for detecting station switches when using a constant step size.
arXiv Detail & Related papers (2020-02-25T00:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.