Related papers: Online Boosting with Bandit Feedback

Online Boosting with Bandit Feedback

URL: http://arxiv.org/abs/2007.11975v1
Date: Thu, 23 Jul 2020 12:40:57 GMT
Title: Online Boosting with Bandit Feedback
Authors: Nataly Brukhim and Elad Hazan
Abstract summary: We consider the problem of online boosting for regression tasks, when only limited information is available to the learner. We give an efficient regret minimization method that has two implications: an online boosting algorithm with noisy multi-point bandit feedback, and a new projection online convex optimization algorithm with gradient.
Score: 36.33990847170534
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of online boosting for regression tasks, when only limited information is available to the learner. We give an efficient regret minimization method that has two implications: an online boosting algorithm with noisy multi-point bandit feedback, and a new projection-free online convex optimization algorithm with stochastic gradient, that improves state-of-the-art guarantees in terms of efficiency.

Related papers

Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions. Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms. We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z)
Particle-based Online Bayesian Sampling [24.290436348629452]
We study an Online Particle-based Variational Inference (OPVI) algorithm that uses a set of particles to represent the approximating distribution. To reduce the gradient error caused by the use of approximation, we include a sublinear increasing batch-size method to reduce the variance. Experiments show that the proposed algorithm achieves better results than naively applying existing Bayesian sampling methods in the online setting.
arXiv Detail & Related papers (2023-02-28T17:46:32Z)
Proximal Point Imitation Learning [48.50107891696562]
We develop new algorithms with rigorous efficiency guarantees for infinite horizon imitation learning. We leverage classical tools from optimization, in particular, the proximal-point method (PPM) and dual smoothing. We achieve convincing empirical performance for both linear and neural network function approximation.
arXiv Detail & Related papers (2022-09-22T12:40:21Z)
Online Agnostic Multiclass Boosting [20.22409095000365]
We give the first boosting algorithm for online agnostic mutliclass classification. Our reduction also enables the construction of algorithms for statistical agnostic, online realizable, and statistical realizable multiclass boosting.
arXiv Detail & Related papers (2022-05-30T13:59:55Z)
Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback [98.7678704343537]
We focus on a class of nonsubmodular functions with special structure, and prove regret guarantees for several variants of the online and approximate online bandit gradient descent algorithms. We derive bounds for the agent's regret in the full information and bandit feedback setting, even if the delay between choosing a decision and receiving the incurred cost is unbounded.
arXiv Detail & Related papers (2022-05-15T08:27:12Z)
Boosting for Online Convex Optimization [64.15578413206715]
We consider the decision-making framework of online convex optimization with a large number of experts. We define a weak learning algorithm as a mechanism that guarantees approximate regret against a base class of experts. We give an efficient boosting algorithm that guarantees near-optimal regret against the convex hull of the base class.
arXiv Detail & Related papers (2021-02-18T12:30:49Z)
Online Agnostic Boosting via Regret Minimization [47.19178618537368]
Boosting is a widely used machine learning approach based on the idea of aggregating weak learning rules. We provide the first online boosting algorithm; that is, given a weak learner with only marginally-better-than-trivial regret guarantees, our algorithm boosts it to a strong learner with sublinear regret.
arXiv Detail & Related papers (2020-03-02T19:21:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.