How to Boost Any Loss Function
- URL: http://arxiv.org/abs/2407.02279v2
- Date: Thu, 14 Nov 2024 10:15:35 GMT
- Title: How to Boost Any Loss Function
- Authors: Richard Nock, Yishay Mansour,
- Abstract summary: We show that any loss function can be optimized with boosting.
We also show that boosting can achieve a feat not yet known to be possible in the classical $0th$ order setting.
- Score: 63.573324901948716
- License:
- Abstract: Boosting is a highly successful ML-born optimization setting in which one is required to computationally efficiently learn arbitrarily good models based on the access to a weak learner oracle, providing classifiers performing at least slightly differently from random guessing. A key difference with gradient-based optimization is that boosting's original model does not requires access to first order information about a loss, yet the decades long history of boosting has quickly evolved it into a first order optimization setting -- sometimes even wrongfully defining it as such. Owing to recent progress extending gradient-based optimization to use only a loss' zeroth ($0^{th}$) order information to learn, this begs the question: what loss functions can be efficiently optimized with boosting and what is the information really needed for boosting to meet the original boosting blueprint's requirements? We provide a constructive formal answer essentially showing that any loss function can be optimized with boosting and thus boosting can achieve a feat not yet known to be possible in the classical $0^{th}$ order setting, since loss functions are not required to be be convex, nor differentiable or Lipschitz -- and in fact not required to be continuous either. Some tools we use are rooted in quantum calculus, the mathematical field -- not to be confounded with quantum computation -- that studies calculus without passing to the limit, and thus without using first order information.
Related papers
- Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms [80.37846867546517]
We show how to train eight different neural networks with custom objectives.
We exploit their second-order information via their empirical Fisherssian matrices.
We apply Loss Lossiable algorithms to achieve significant improvements for less differentiable algorithms.
arXiv Detail & Related papers (2024-10-24T18:02:11Z) - Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs.
We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention.
Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z) - LP++: A Surprisingly Strong Linear Probe for Few-Shot CLIP [20.86307407685542]
Linear Probe (LP) has been often reported as a weak baseline for few-shot CLIP adaptation.
In this work, we examine from convex-optimization perspectives a generalization of the standard LP baseline.
Our image-language objective function, along with these non-trivial optimization insights and ingredients, yields, surprisingly, highly competitive few-shot CLIP performances.
arXiv Detail & Related papers (2024-04-02T20:23:10Z) - ELRA: Exponential learning rate adaption gradient descent optimization
method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z) - Being Properly Improper [36.52509571098292]
We analyse class probability-based losses when they are stripped off the mandatory properness.
We show that a natural extension of a half-century old loss introduced by S. Arimoto is twist proper.
We then turn to a theory that has provided some of the best off-the-shelf algorithms for proper losses, boosting.
arXiv Detail & Related papers (2021-06-18T05:00:15Z) - MP-Boost: Minipatch Boosting via Adaptive Feature and Observation
Sampling [0.0]
MP-Boost is an algorithm loosely based on AdaBoost that learns by adaptively selecting small subsets of instances and features.
We empirically demonstrate the interpretability, comparative accuracy, and computational time of our approach on a variety of binary classification tasks.
arXiv Detail & Related papers (2020-11-14T04:26:13Z) - Improved Quantum Boosting [1.0660480034605238]
Arunachalam and Maity gave the first quantum improvement for boosting, by combining Freund and Schapire's AdaBoost algorithm with a quantum algorithm for approximate counting.
In this paper we give a substantially faster and simpler quantum boosting algorithm, based on Servedio's SmoothBoost algorithm.
arXiv Detail & Related papers (2020-09-17T15:16:42Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z) - Fully-Corrective Gradient Boosting with Squared Hinge: Fast Learning
Rates and Early Stopping [29.485528641599018]
We propose an efficient boosting method with theoretical generalization guarantees for binary classification.
We derive a fast learning rate of the order $cal O((m/log m)-1/4)$ for the proposed boosting method.
Both derived learning rates are the best ones among the existing generalization results of boosting-type methods for classification.
arXiv Detail & Related papers (2020-04-01T00:39:24Z) - On the Dual Formulation of Boosting Algorithms [92.74617630106559]
We show that the Lagrange problems of AdaBoost, LogitBoost and soft-marginBoost are all dual problems with generalized hinge loss entropy.
By looking at the dual problems of these boosting algorithms, we show that the success of boosting can be understood in terms of maintaining a better margin distribution.
arXiv Detail & Related papers (2009-01-23T02:14:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.