Meta-Learning Adversarial Bandits
- URL: http://arxiv.org/abs/2205.14128v1
- Date: Fri, 27 May 2022 17:40:32 GMT
- Title: Meta-Learning Adversarial Bandits
- Authors: Maria-Florina Balcan, Keegan Harris, Mikhail Khodak, Zhiwei Steven Wu
- Abstract summary: We study online learning with bandit feedback across multiple tasks, with the goal of improving average performance across tasks if they are similar according to some natural task-similarity measure.
As the first to target the adversarial setting, we design a meta-algorithm that setting-specific guarantees for two important cases: multi-armed bandits (MAB) and bandit optimization (BLO)
Our guarantees rely on proving that unregularized follow-the-leader combined with multiplicative weights is enough to online learn a non-smooth and non-B sequence.
- Score: 49.094361442409785
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study online learning with bandit feedback across multiple tasks, with the
goal of improving average performance across tasks if they are similar
according to some natural task-similarity measure. As the first to target the
adversarial setting, we design a unified meta-algorithm that yields
setting-specific guarantees for two important cases: multi-armed bandits (MAB)
and bandit linear optimization (BLO). For MAB, the meta-algorithm tunes the
initialization, step-size, and entropy parameter of the Tsallis-entropy
generalization of the well-known Exp3 method, with the task-averaged regret
provably improving if the entropy of the distribution over estimated
optima-in-hindsight is small. For BLO, we learn the initialization, step-size,
and boundary-offset of online mirror descent (OMD) with self-concordant barrier
regularizers, showing that task-averaged regret varies directly with a measure
induced by these functions on the interior of the action space. Our adaptive
guarantees rely on proving that unregularized follow-the-leader combined with
multiplicative weights is enough to online learn a non-smooth and non-convex
sequence of affine functions of Bregman divergences that upper-bound the regret
of OMD.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient deep ensemble method for self-attention networks.
By employing a single pre-trained self-attention network with weights shared across all members, we train member-specific low-rank matrices for the attention projections.
Our method exhibits superior calibration compared to explicit ensembles and achieves similar or better accuracy across various prediction tasks and datasets.
arXiv Detail & Related papers (2024-05-23T11:10:32Z) - Meta-Learning Adversarial Bandit Algorithms [55.72892209124227]
We study online meta-learning with bandit feedback.
We learn to tune online mirror descent generalization (OMD) with self-concordant barrier regularizers.
arXiv Detail & Related papers (2023-07-05T13:52:10Z) - Convergence of ease-controlled Random Reshuffling gradient Algorithms under Lipschitz smoothness [0.0]
We consider the average of a very large number of smooth possibly non-size functions, and we use two widely minibatch frameworks to tackle this problem.
We define ease-controlled modifications of IG/RR schemes, which require a light additional computational effort.
We prove our implementation with both a full batch gradient (i.e. L-BFGS) and an implementation of IG/RR methods, proving that algorithms require a similar computational effort.
arXiv Detail & Related papers (2022-12-04T15:26:36Z) - Experimental Design for Regret Minimization in Linear Bandits [19.8309784360219]
We propose a novel design-based algorithm to minimize regret in online linear and bandits.
We provide state-of-the-art finite time regret guarantees and show that our algorithm can be applied in both the bandit and semi-bandit feedback regime.
arXiv Detail & Related papers (2020-11-01T17:59:19Z) - Meta-learning with Stochastic Linear Bandits [120.43000970418939]
We consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector.
We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
arXiv Detail & Related papers (2020-05-18T08:41:39Z) - PAC-Bayes meta-learning with implicit task-specific posteriors [37.32107678838193]
We introduce a new and rigorously-formulated PAC-Bayes meta-learning algorithm that solves few-shot learning.
We show that the models trained with our proposed meta-learning algorithm are well calibrated and accurate.
arXiv Detail & Related papers (2020-03-05T06:56:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.