Related papers: Boosting for Online Convex Optimization

Boosting for Online Convex Optimization

URL: http://arxiv.org/abs/2102.09305v1
Date: Thu, 18 Feb 2021 12:30:49 GMT
Title: Boosting for Online Convex Optimization
Authors: Elad Hazan, Karan Singh
Abstract summary: We consider the decision-making framework of online convex optimization with a large number of experts. We define a weak learning algorithm as a mechanism that guarantees approximate regret against a base class of experts. We give an efficient boosting algorithm that guarantees near-optimal regret against the convex hull of the base class.
Score: 64.15578413206715
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We consider the decision-making framework of online convex optimization with a very large number of experts. This setting is ubiquitous in contextual and reinforcement learning problems, where the size of the policy class renders enumeration and search within the policy class infeasible. Instead, we consider generalizing the methodology of online boosting. We define a weak learning algorithm as a mechanism that guarantees multiplicatively approximate regret against a base class of experts. In this access model, we give an efficient boosting algorithm that guarantees near-optimal regret against the convex hull of the base class. We consider both full and partial (a.k.a. bandit) information feedback models. We also give an analogous efficient boosting algorithm for the i.i.d. statistical setting. Our results simultaneously generalize online boosting and gradient boosting guarantees to contextual learning model, online convex optimization and bandit linear optimization settings.

Related papers

Rethinking Optimal Transport in Offline Reinforcement Learning [64.56896902186126]
In offline reinforcement learning, the data is provided by various experts and some of them can be sub-optimal. To extract an efficient policy, it is necessary to emphstitch the best behaviors from the dataset. We present an algorithm that aims to find a policy that maps states to a emphpartial distribution of the best expert actions for each given state.
arXiv Detail & Related papers (2024-10-17T22:36:43Z)
Minimax Adaptive Boosting for Online Nonparametric Regression [10.138723409205497]
We introduce a parameter-free online gradient boosting (OGB) algorithm for adversarial online nonparametric regression. We show that its application to chaining trees achieves minimax optimal regret when competing against Lipschitz functions.
arXiv Detail & Related papers (2024-10-04T12:30:03Z)
Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks. We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations. We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z)
Faster Margin Maximization Rates for Generic and Adversarially Robust Optimization Methods [20.118513136686452]
First-order optimization methods tend to inherently favor certain solutions over others when minimizing an underdetermined training objective. We present a series of state-of-the-art implicit bias rates for mirror descent and steepest descent algorithms. Our accelerated rates are derived by leveraging the regret bounds of online learning algorithms within this game framework.
arXiv Detail & Related papers (2023-05-27T18:16:56Z)
Boosted Off-Policy Learning [21.042970740577648]
We propose the first boosting algorithm for off-policy learning from logged bandit feedback. Unlike existing boosting methods for supervised learning, our algorithm directly optimize an estimate of the policy's expected reward. We show how to reduce the base learner to supervised learning, which opens up a broad range of readily available base learners.
arXiv Detail & Related papers (2022-08-01T21:43:02Z)
Online Agnostic Multiclass Boosting [20.22409095000365]
We give the first boosting algorithm for online agnostic mutliclass classification. Our reduction also enables the construction of algorithms for statistical agnostic, online realizable, and statistical realizable multiclass boosting.
arXiv Detail & Related papers (2022-05-30T13:59:55Z)
Learning Optimal Antenna Tilt Control Policies: A Contextual Linear Bandit Approach [65.27783264330711]
Controlling antenna tilts in cellular networks is imperative to reach an efficient trade-off between network coverage and capacity. We devise algorithms learning optimal tilt control policies from existing data. We show that they can produce optimal tilt update policy using much fewer data samples than naive or existing rule-based learning algorithms.
arXiv Detail & Related papers (2022-01-06T18:24:30Z)
A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states. We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z)
Online Boosting with Bandit Feedback [36.33990847170534]
We consider the problem of online boosting for regression tasks, when only limited information is available to the learner. We give an efficient regret minimization method that has two implications: an online boosting algorithm with noisy multi-point bandit feedback, and a new projection online convex optimization algorithm with gradient.
arXiv Detail & Related papers (2020-07-23T12:40:57Z)
Online Agnostic Boosting via Regret Minimization [47.19178618537368]
Boosting is a widely used machine learning approach based on the idea of aggregating weak learning rules. We provide the first online boosting algorithm; that is, given a weak learner with only marginally-better-than-trivial regret guarantees, our algorithm boosts it to a strong learner with sublinear regret.
arXiv Detail & Related papers (2020-03-02T19:21:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.