Boosting with Tempered Exponential Measures
- URL: http://arxiv.org/abs/2306.05487v1
- Date: Thu, 8 Jun 2023 18:17:48 GMT
- Title: Boosting with Tempered Exponential Measures
- Authors: Richard Nock, Ehsan Amid, Manfred K. Warmuth
- Abstract summary: A popular ML algorithm, AdaBoost, can be derived from the dual of a relative entropy minimization problem.
We generalize this setup to the recently introduced it tempered exponential measures (TEMs) where normalization is enforced on a specific power of the measure and not the measure itself.
Our algorithm, $t$-AdaBoost, recovers AdaBoostas a special case ($t=1$)
- Score: 40.961820572346426
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the most popular ML algorithms, AdaBoost, can be derived from the dual
of a relative entropy minimization problem subject to the fact that the
positive weights on the examples sum to one. Essentially, harder examples
receive higher probabilities. We generalize this setup to the recently
introduced {\it tempered exponential measure}s (TEMs) where normalization is
enforced on a specific power of the measure and not the measure itself. TEMs
are indexed by a parameter $t$ and generalize exponential families ($t=1$). Our
algorithm, $t$-AdaBoost, recovers AdaBoost~as a special case ($t=1$). We show
that $t$-AdaBoost retains AdaBoost's celebrated exponential convergence rate
when $t\in [0,1)$ while allowing a slight improvement of the rate's hidden
constant compared to $t=1$. $t$-AdaBoost partially computes on a generalization
of classical arithmetic over the reals and brings notable properties like
guaranteed bounded leveraging coefficients for $t\in [0,1)$. From the loss that
$t$-AdaBoost minimizes (a generalization of the exponential loss), we show how
to derive a new family of {\it tempered} losses for the induction of
domain-partitioning classifiers like decision trees. Crucially, strict
properness is ensured for all while their boosting rates span the full known
spectrum. Experiments using $t$-AdaBoost+trees display that significant
leverage can be achieved by tuning $t$.
Related papers
- SGD with memory: fundamental properties and stochastic acceleration [15.25021403154845]
In non-stochastic setting, the optimal exponent $xi$ in the loss convergence $L_tsim C_Lt-xi is double that in plain GD.
We show that in memory-1 algorithms we can make $C_L$ arbitrarily small while maintaining stability.
arXiv Detail & Related papers (2024-10-05T16:57:40Z) - How to Boost Any Loss Function [63.573324901948716]
We show that loss functions can be efficiently optimized with boosting.
We show that boosting can achieve a feat not yet known to be possible in the classical $0th$ order setting.
arXiv Detail & Related papers (2024-07-02T14:08:23Z) - Boosting as Frank-Wolfe [0.6875312133832078]
We propose a generic boosting scheme that combines the Frank-Wolfe algorithm and any secondary algorithm.
We show that the scheme retains the same convergence guarantee as ERLPBoost and C-ERLPBoost.
arXiv Detail & Related papers (2022-09-22T07:36:55Z) - Continuous Submodular Maximization: Boosting via Non-oblivious Function [12.755674710719616]
In this paper, we revisit the constrained and continuous submodular iterations in both offline and online settings.
We use the factorrevealing optimization equation to derive an optimal auxiliary function $F$ for problem $max_boldsymbolxinmathCf(boldsymbolx)$.
In the online setting, we propose boosting a gradient feedback algorithm, achieving a regret of $sqrtD$ (where $D$ is the sum of delays of gradient feedback against $(fracgamma2)$ adversarial.
arXiv Detail & Related papers (2022-01-03T15:10:17Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Fast Rates for the Regret of Offline Reinforcement Learning [69.23654172273085]
We study the regret of reinforcement learning from offline data generated by a fixed behavior policy in an infinitehorizon discounted decision process (MDP)
We show that given any estimate for the optimal quality function $Q*$, the regret of the policy it defines converges at a rate given by the exponentiation of the $Q*$-estimate's pointwise convergence rate.
arXiv Detail & Related papers (2021-01-31T16:17:56Z) - Accelerated Zeroth-Order and First-Order Momentum Methods from Mini to
Minimax Optimization [133.53164856723782]
We propose a new accelerated zeroth-order momentum (AccZOM) method for black-box mini-optimization where only function values can be obtained.
Meanwhile, we propose an accelerated zeroth-order momentum descent (Acc-MDA) method for black-box minimax optimization, where only function values can be obtained.
In particular, our Acc-MDA can obtain a lower gradient complexity of $tildeO(kappa_y2.5epsilon-3)$ with a batch size $O(kappa_y4)$.
arXiv Detail & Related papers (2020-08-18T22:19:29Z) - Fully-Corrective Gradient Boosting with Squared Hinge: Fast Learning
Rates and Early Stopping [29.485528641599018]
We propose an efficient boosting method with theoretical generalization guarantees for binary classification.
We derive a fast learning rate of the order $cal O((m/log m)-1/4)$ for the proposed boosting method.
Both derived learning rates are the best ones among the existing generalization results of boosting-type methods for classification.
arXiv Detail & Related papers (2020-04-01T00:39:24Z) - A Precise High-Dimensional Asymptotic Theory for Boosting and
Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
This paper establishes a precise high-dimensional theory for boosting on separable data.
Under a class of statistical models, we provide an exact analysis of the universality error of boosting.
We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
arXiv Detail & Related papers (2020-02-05T00:24:53Z) - On the Dual Formulation of Boosting Algorithms [92.74617630106559]
We show that the Lagrange problems of AdaBoost, LogitBoost and soft-marginBoost are all dual problems with generalized hinge loss entropy.
By looking at the dual problems of these boosting algorithms, we show that the success of boosting can be understood in terms of maintaining a better margin distribution.
arXiv Detail & Related papers (2009-01-23T02:14:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.