Fully-Corrective Gradient Boosting with Squared Hinge: Fast Learning
Rates and Early Stopping
- URL: http://arxiv.org/abs/2004.00179v1
- Date: Wed, 1 Apr 2020 00:39:24 GMT
- Title: Fully-Corrective Gradient Boosting with Squared Hinge: Fast Learning
Rates and Early Stopping
- Authors: Jinshan Zeng, Min Zhang and Shao-Bo Lin
- Abstract summary: We propose an efficient boosting method with theoretical generalization guarantees for binary classification.
We derive a fast learning rate of the order $cal O((m/log m)-1/4)$ for the proposed boosting method.
Both derived learning rates are the best ones among the existing generalization results of boosting-type methods for classification.
- Score: 29.485528641599018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Boosting is a well-known method for improving the accuracy of weak learners
in machine learning. However, its theoretical generalization guarantee is
missing in literature. In this paper, we propose an efficient boosting method
with theoretical generalization guarantees for binary classification. Three key
ingredients of the proposed boosting method are: a) the
\textit{fully-corrective greedy} (FCG) update in the boosting procedure, b) a
differentiable \textit{squared hinge} (also called \textit{truncated
quadratic}) function as the loss function, and c) an efficient alternating
direction method of multipliers (ADMM) algorithm for the associated FCG
optimization. The used squared hinge loss not only inherits the robustness of
the well-known hinge loss for classification with outliers, but also brings
some benefits for computational implementation and theoretical justification.
Under some sparseness assumption, we derive a fast learning rate of the order
${\cal O}((m/\log m)^{-1/4})$ for the proposed boosting method, which can be
further improved to ${\cal O}((m/\log m)^{-1/2})$ if certain additional noise
assumption is imposed, where $m$ is the size of sample set. Both derived
learning rates are the best ones among the existing generalization results of
boosting-type methods for classification. Moreover, an efficient early stopping
scheme is provided for the proposed method. A series of toy simulations and
real data experiments are conducted to verify the developed theories and
demonstrate the effectiveness of the proposed method.
Related papers
- How to Boost Any Loss Function [63.573324901948716]
We show that loss functions can be efficiently optimized with boosting.
We show that boosting can achieve a feat not yet known to be possible in the classical $0th$ order setting.
arXiv Detail & Related papers (2024-07-02T14:08:23Z) - A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation [121.0693322732454]
Contrastive Language-Image Pretraining (CLIP) has gained popularity for its remarkable zero-shot capacity.
Recent research has focused on developing efficient fine-tuning methods to enhance CLIP's performance in downstream tasks.
We revisit a classical algorithm, Gaussian Discriminant Analysis (GDA), and apply it to the downstream classification of CLIP.
arXiv Detail & Related papers (2024-02-06T15:45:27Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - BiAdam: Fast Adaptive Bilevel Optimization Methods [104.96004056928474]
Bilevel optimization has attracted increased interest in machine learning due to its many applications.
We provide a useful analysis framework for both the constrained and unconstrained optimization.
arXiv Detail & Related papers (2021-06-21T20:16:40Z) - Boosting for Online Convex Optimization [64.15578413206715]
We consider the decision-making framework of online convex optimization with a large number of experts.
We define a weak learning algorithm as a mechanism that guarantees approximate regret against a base class of experts.
We give an efficient boosting algorithm that guarantees near-optimal regret against the convex hull of the base class.
arXiv Detail & Related papers (2021-02-18T12:30:49Z) - MP-Boost: Minipatch Boosting via Adaptive Feature and Observation
Sampling [0.0]
MP-Boost is an algorithm loosely based on AdaBoost that learns by adaptively selecting small subsets of instances and features.
We empirically demonstrate the interpretability, comparative accuracy, and computational time of our approach on a variety of binary classification tasks.
arXiv Detail & Related papers (2020-11-14T04:26:13Z) - Tune smarter not harder: A principled approach to tuning learning rates
for shallow nets [13.203765985718201]
principled approach to choosing the learning rate is proposed for shallow feedforward neural networks.
It is shown through simulations that the proposed search method significantly outperforms the existing tuning methods.
arXiv Detail & Related papers (2020-03-22T09:38:35Z) - StochasticRank: Global Optimization of Scale-Free Discrete Functions [28.224889996383396]
In this paper, we introduce a powerful and efficient framework for direct optimization of ranking metrics.
We show that classic smoothing approaches may introduce bias and present a universal solution for a proper debiasing.
Our framework applies to any scale-free discrete loss function.
arXiv Detail & Related papers (2020-03-04T15:27:11Z) - On the Dual Formulation of Boosting Algorithms [92.74617630106559]
We show that the Lagrange problems of AdaBoost, LogitBoost and soft-marginBoost are all dual problems with generalized hinge loss entropy.
By looking at the dual problems of these boosting algorithms, we show that the success of boosting can be understood in terms of maintaining a better margin distribution.
arXiv Detail & Related papers (2009-01-23T02:14:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.