When Analytic Calculus Cracks AdaBoost Code
- URL: http://arxiv.org/abs/2308.01070v2
- Date: Wed, 7 Feb 2024 17:13:43 GMT
- Title: When Analytic Calculus Cracks AdaBoost Code
- Authors: Jean-Marc Brossier, Olivier Lafitte, Lenny R\'ethor\'e
- Abstract summary: This study analyzes the (two classes) AdaBoost procedure implemented in scikit-learn.
AdaBoost is an algorithm in name only, as the resulting combination of weak classifiers can be explicitly calculated using a truth table.
We observe that this formula does not give the point of minimum of the risk, we provide a system to compute the exact point of minimum and we check that the AdaBoost procedure in scikit-learn does not implement the algorithm described by Freund and Schapire.
- Score: 0.30693357740321775
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The principle of boosting in supervised learning involves combining multiple
weak classifiers to obtain a stronger classifier. AdaBoost has the reputation
to be a perfect example of this approach.
This study analyzes the (two classes) AdaBoost procedure implemented in
scikit-learn.
This paper shows that AdaBoost is an algorithm in name only, as the resulting
combination of weak classifiers can be explicitly calculated using a truth
table.
Indeed, using a logical analysis of the training set with weak classifiers
constructing a truth table, we recover, through an analytical formula, the
weights of the combination of these weak classifiers obtained by the procedure.
We observe that this formula does not give the point of minimum of the risk,
we provide a system to compute the exact point of minimum and we check that the
AdaBoost procedure in scikit-learn does not implement the algorithm described
by Freund and Schapire.
Related papers
- Sample-Efficient Agnostic Boosting [19.15484761265653]
Empirical Risk Minimization (ERM) outstrips the agnostic boosting methodology in being quadratically more sample efficient than all known boosting algorithms.
A key feature of our algorithm is that it leverages the ability to reuse samples across multiple rounds of boosting, while guaranteeing a generalization error strictly better than those obtained by blackbox applications of uniform convergence arguments.
arXiv Detail & Related papers (2024-10-31T04:50:29Z) - The Many Faces of Optimal Weak-to-Strong Learning [10.985323882432086]
We present a new and surprisingly simple Boosting algorithm that obtains a provably optimal sample complexity.
Our pilot empirical study suggests that our new algorithm might outperform previous algorithms on large data sets.
arXiv Detail & Related papers (2024-08-30T09:38:51Z) - Achieving More with Less: A Tensor-Optimization-Powered Ensemble Method [53.170053108447455]
Ensemble learning is a method that leverages weak learners to produce a strong learner.
We design a smooth and convex objective function that leverages the concept of margin, making the strong learner more discriminative.
We then compare our algorithm with random forests of ten times the size and other classical methods across numerous datasets.
arXiv Detail & Related papers (2024-08-06T03:42:38Z) - Boosting, Voting Classifiers and Randomized Sample Compression Schemes [14.885182312708196]
In boosting, we aim to leverage multiple weak learners to produce a strong learner.
We propose a randomized boosting algorithm that outputs voting classifiers whose generalization error contains a single logarithmic dependency on the sample size.
arXiv Detail & Related papers (2024-02-05T12:58:03Z) - Multiclass Boosting: Simple and Intuitive Weak Learning Criteria [72.71096438538254]
We give a simple and efficient boosting algorithm, that does not require realizability assumptions.
We present a new result on boosting for list learners, as well as provide a novel proof for the characterization of multiclass PAC learning.
arXiv Detail & Related papers (2023-07-02T19:26:58Z) - AdaBoost is not an Optimal Weak to Strong Learner [11.003568749905359]
We show that the sample complexity of AdaBoost, and other classic variations thereof, are sub-optimal by at least one logarithmic factor in the desired accuracy of the strong learner.
arXiv Detail & Related papers (2023-01-27T07:37:51Z) - ProBoost: a Boosting Method for Probabilistic Classifiers [55.970609838687864]
ProBoost is a new boosting algorithm for probabilistic classifiers.
It uses the uncertainty of each training sample to determine the most challenging/uncertain ones.
It produces a sequence that progressively focuses on the samples found to have the highest uncertainty.
arXiv Detail & Related papers (2022-09-04T12:49:20Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Boosting for Online Convex Optimization [64.15578413206715]
We consider the decision-making framework of online convex optimization with a large number of experts.
We define a weak learning algorithm as a mechanism that guarantees approximate regret against a base class of experts.
We give an efficient boosting algorithm that guarantees near-optimal regret against the convex hull of the base class.
arXiv Detail & Related papers (2021-02-18T12:30:49Z) - Fully-Corrective Gradient Boosting with Squared Hinge: Fast Learning
Rates and Early Stopping [29.485528641599018]
We propose an efficient boosting method with theoretical generalization guarantees for binary classification.
We derive a fast learning rate of the order $cal O((m/log m)-1/4)$ for the proposed boosting method.
Both derived learning rates are the best ones among the existing generalization results of boosting-type methods for classification.
arXiv Detail & Related papers (2020-04-01T00:39:24Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.