Meta-Learning Adversarial Bandit Algorithms
- URL: http://arxiv.org/abs/2307.02295v2
- Date: Wed, 1 Nov 2023 16:15:35 GMT
- Title: Meta-Learning Adversarial Bandit Algorithms
- Authors: Mikhail Khodak, Ilya Osadchiy, Keegan Harris, Maria-Florina Balcan,
Kfir Y. Levy, Ron Meir, Zhiwei Steven Wu
- Abstract summary: We study online meta-learning with bandit feedback.
We learn to tune online mirror descent generalization (OMD) with self-concordant barrier regularizers.
- Score: 55.72892209124227
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study online meta-learning with bandit feedback, with the goal of
improving performance across multiple tasks if they are similar according to
some natural similarity measure. As the first to target the adversarial
online-within-online partial-information setting, we design meta-algorithms
that combine outer learners to simultaneously tune the initialization and other
hyperparameters of an inner learner for two important cases: multi-armed
bandits (MAB) and bandit linear optimization (BLO). For MAB, the meta-learners
initialize and set hyperparameters of the Tsallis-entropy generalization of
Exp3, with the task-averaged regret improving if the entropy of the
optima-in-hindsight is small. For BLO, we learn to initialize and tune online
mirror descent (OMD) with self-concordant barrier regularizers, showing that
task-averaged regret varies directly with an action space-dependent measure
they induce. Our guarantees rely on proving that unregularized
follow-the-leader combined with two levels of low-dimensional hyperparameter
tuning is enough to learn a sequence of affine functions of non-Lipschitz and
sometimes non-convex Bregman divergences bounding the regret of OMD.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - Multi-behavior Self-supervised Learning for Recommendation [36.42241501002167]
We propose a Multi-Behavior Self-Supervised Learning (MBSSL) framework together with an adaptive optimization method.
Specifically, we devise a behavior-aware graph neural network incorporating the self-attention mechanism to capture behavior multiplicity and dependencies.
Experiments on five real-world datasets demonstrate the consistent improvements obtained by MBSSL over ten state-of-the art (SOTA) baselines.
arXiv Detail & Related papers (2023-05-22T15:57:32Z) - Meta-Learning Adversarial Bandits [49.094361442409785]
We study online learning with bandit feedback across multiple tasks, with the goal of improving average performance across tasks if they are similar according to some natural task-similarity measure.
As the first to target the adversarial setting, we design a meta-algorithm that setting-specific guarantees for two important cases: multi-armed bandits (MAB) and bandit optimization (BLO)
Our guarantees rely on proving that unregularized follow-the-leader combined with multiplicative weights is enough to online learn a non-smooth and non-B sequence.
arXiv Detail & Related papers (2022-05-27T17:40:32Z) - Learning-to-learn non-convex piecewise-Lipschitz functions [44.6133187924678]
We analyze the meta-learning of the algorithms for piecewise-Lipschitz functions, a non-task with applications to both machine learning algorithms.
We propose a practical meta-learning procedure that learns both the step-size of the algorithm from multiple online learning tasks.
arXiv Detail & Related papers (2021-08-19T16:22:48Z) - Fast and Scalable Adversarial Training of Kernel SVM via Doubly
Stochastic Gradients [34.98827928892501]
Adversarial attacks by generating examples which are almost indistinguishable from natural examples, pose a serious threat to learning models.
Support vector machine (SVM) is a classical yet still important learning algorithm even in the current deep learning era.
We propose adv-SVM to improve its adversarial robustness via adversarial training, which has been demonstrated to be the most promising defense techniques.
arXiv Detail & Related papers (2021-07-21T08:15:32Z) - Large-Scale Meta-Learning with Continual Trajectory Shifting [76.29017270864308]
We show that allowing the meta-learners to take a larger number of inner gradient steps better captures the structure of heterogeneous and large-scale tasks.
In order to increase the frequency of meta-updates, we propose to estimate the required shift of the task-specific parameters.
We show that the algorithm largely outperforms the previous first-order meta-learning methods in terms of both generalization performance and convergence.
arXiv Detail & Related papers (2021-02-14T18:36:33Z) - Meta-Learning with Neural Tangent Kernels [58.06951624702086]
We propose the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model's Neural Tangent Kernel (NTK)
Within this paradigm, we introduce two meta-learning algorithms, which no longer need a sub-optimal iterative inner-loop adaptation as in the MAML framework.
We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.
arXiv Detail & Related papers (2021-02-07T20:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.