Fully-Corrective Gradient Boosting with Squared Hinge: Fast Learning
Rates and Early Stopping
- URL: http://arxiv.org/abs/2004.00179v1
- Date: Wed, 1 Apr 2020 00:39:24 GMT
- Title: Fully-Corrective Gradient Boosting with Squared Hinge: Fast Learning
Rates and Early Stopping
- Authors: Jinshan Zeng, Min Zhang and Shao-Bo Lin
- Abstract summary: We propose an efficient boosting method with theoretical generalization guarantees for binary classification.
We derive a fast learning rate of the order $cal O((m/log m)-1/4)$ for the proposed boosting method.
Both derived learning rates are the best ones among the existing generalization results of boosting-type methods for classification.
- Score: 29.485528641599018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Boosting is a well-known method for improving the accuracy of weak learners
in machine learning. However, its theoretical generalization guarantee is
missing in literature. In this paper, we propose an efficient boosting method
with theoretical generalization guarantees for binary classification. Three key
ingredients of the proposed boosting method are: a) the
\textit{fully-corrective greedy} (FCG) update in the boosting procedure, b) a
differentiable \textit{squared hinge} (also called \textit{truncated
quadratic}) function as the loss function, and c) an efficient alternating
direction method of multipliers (ADMM) algorithm for the associated FCG
optimization. The used squared hinge loss not only inherits the robustness of
the well-known hinge loss for classification with outliers, but also brings
some benefits for computational implementation and theoretical justification.
Under some sparseness assumption, we derive a fast learning rate of the order
${\cal O}((m/\log m)^{-1/4})$ for the proposed boosting method, which can be
further improved to ${\cal O}((m/\log m)^{-1/2})$ if certain additional noise
assumption is imposed, where $m$ is the size of sample set. Both derived
learning rates are the best ones among the existing generalization results of
boosting-type methods for classification. Moreover, an efficient early stopping
scheme is provided for the proposed method. A series of toy simulations and
real data experiments are conducted to verify the developed theories and
demonstrate the effectiveness of the proposed method.
Related papers
- Sample-Efficient Agnostic Boosting [19.15484761265653]
Empirical Risk Minimization (ERM) outstrips the agnostic boosting methodology in being quadratically more sample efficient than all known boosting algorithms.
A key feature of our algorithm is that it leverages the ability to reuse samples across multiple rounds of boosting, while guaranteeing a generalization error strictly better than those obtained by blackbox applications of uniform convergence arguments.
arXiv Detail & Related papers (2024-10-31T04:50:29Z) - Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment.
We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z) - Understanding Gradient Boosting Classifier: Training, Prediction, and the Role of $γ_j$ [2.44755919161855]
The Gradient Boosting (GBC) is a widely used machine learning algorithm for binary classification.
This document explains the training and prediction processes, focusing on the computation of terminal node values.
We provide a step-by-step example in the appendix to help readers understand.
arXiv Detail & Related papers (2024-10-08T02:11:35Z) - Active Learning for Level Set Estimation Using Randomized Straddle Algorithms [18.96269063427081]
We propose a novel method to identify the set of input points where a function takes value above (or below) a given threshold.
The confidence parameter in the proposed method has the advantages of not needing adjustment, not depending on the number of iterations and candidate points, and not being conservative.
arXiv Detail & Related papers (2024-08-06T12:39:12Z) - How to Boost Any Loss Function [63.573324901948716]
We show that any loss function can be optimized with boosting.
We also show that boosting can achieve a feat not yet known to be possible in the classical $0th$ order setting.
arXiv Detail & Related papers (2024-07-02T14:08:23Z) - A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation [121.0693322732454]
Contrastive Language-Image Pretraining (CLIP) has gained popularity for its remarkable zero-shot capacity.
Recent research has focused on developing efficient fine-tuning methods to enhance CLIP's performance in downstream tasks.
We revisit a classical algorithm, Gaussian Discriminant Analysis (GDA), and apply it to the downstream classification of CLIP.
arXiv Detail & Related papers (2024-02-06T15:45:27Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - BiAdam: Fast Adaptive Bilevel Optimization Methods [104.96004056928474]
Bilevel optimization has attracted increased interest in machine learning due to its many applications.
We provide a useful analysis framework for both the constrained and unconstrained optimization.
arXiv Detail & Related papers (2021-06-21T20:16:40Z) - MP-Boost: Minipatch Boosting via Adaptive Feature and Observation
Sampling [0.0]
MP-Boost is an algorithm loosely based on AdaBoost that learns by adaptively selecting small subsets of instances and features.
We empirically demonstrate the interpretability, comparative accuracy, and computational time of our approach on a variety of binary classification tasks.
arXiv Detail & Related papers (2020-11-14T04:26:13Z) - StochasticRank: Global Optimization of Scale-Free Discrete Functions [28.224889996383396]
In this paper, we introduce a powerful and efficient framework for direct optimization of ranking metrics.
We show that classic smoothing approaches may introduce bias and present a universal solution for a proper debiasing.
Our framework applies to any scale-free discrete loss function.
arXiv Detail & Related papers (2020-03-04T15:27:11Z) - On the Dual Formulation of Boosting Algorithms [92.74617630106559]
We show that the Lagrange problems of AdaBoost, LogitBoost and soft-marginBoost are all dual problems with generalized hinge loss entropy.
By looking at the dual problems of these boosting algorithms, we show that the success of boosting can be understood in terms of maintaining a better margin distribution.
arXiv Detail & Related papers (2009-01-23T02:14:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.