Boosting, Voting Classifiers and Randomized Sample Compression Schemes
- URL: http://arxiv.org/abs/2402.02976v1
- Date: Mon, 5 Feb 2024 12:58:03 GMT
- Title: Boosting, Voting Classifiers and Randomized Sample Compression Schemes
- Authors: Arthur da Cunha, Kasper Green Larsen, Martin Ritzert
- Abstract summary: In boosting, we aim to leverage multiple weak learners to produce a strong learner.
We propose a randomized boosting algorithm that outputs voting classifiers whose generalization error contains a single logarithmic dependency on the sample size.
- Score: 14.885182312708196
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In boosting, we aim to leverage multiple weak learners to produce a strong
learner. At the center of this paradigm lies the concept of building the strong
learner as a voting classifier, which outputs a weighted majority vote of the
weak learners. While many successful boosting algorithms, such as the iconic
AdaBoost, produce voting classifiers, their theoretical performance has long
remained sub-optimal: the best known bounds on the number of training examples
necessary for a voting classifier to obtain a given accuracy has so far always
contained at least two logarithmic factors above what is known to be achievable
by general weak-to-strong learners. In this work, we break this barrier by
proposing a randomized boosting algorithm that outputs voting classifiers whose
generalization error contains a single logarithmic dependency on the sample
size. We obtain this result by building a general framework that extends sample
compression methods to support randomized learning algorithms based on
sub-sampling.
Related papers
- When Analytic Calculus Cracks AdaBoost Code [0.30693357740321775]
This study analyzes the (two classes) AdaBoost procedure implemented in scikit-learn.
AdaBoost is an algorithm in name only, as the resulting combination of weak classifiers can be explicitly calculated using a truth table.
We observe that this formula does not give the point of minimum of the risk, we provide a system to compute the exact point of minimum and we check that the AdaBoost procedure in scikit-learn does not implement the algorithm described by Freund and Schapire.
arXiv Detail & Related papers (2023-08-02T10:37:25Z) - Multiclass Boosting: Simple and Intuitive Weak Learning Criteria [72.71096438538254]
We give a simple and efficient boosting algorithm, that does not require realizability assumptions.
We present a new result on boosting for list learners, as well as provide a novel proof for the characterization of multiclass PAC learning.
arXiv Detail & Related papers (2023-07-02T19:26:58Z) - AdaBoost is not an Optimal Weak to Strong Learner [11.003568749905359]
We show that the sample complexity of AdaBoost, and other classic variations thereof, are sub-optimal by at least one logarithmic factor in the desired accuracy of the strong learner.
arXiv Detail & Related papers (2023-01-27T07:37:51Z) - Confidence-aware Training of Smoothed Classifiers for Certified
Robustness [75.95332266383417]
We use "accuracy under Gaussian noise" as an easy-to-compute proxy of adversarial robustness for an input.
Our experiments show that the proposed method consistently exhibits improved certified robustness upon state-of-the-art training methods.
arXiv Detail & Related papers (2022-12-18T03:57:12Z) - ProBoost: a Boosting Method for Probabilistic Classifiers [55.970609838687864]
ProBoost is a new boosting algorithm for probabilistic classifiers.
It uses the uncertainty of each training sample to determine the most challenging/uncertain ones.
It produces a sequence that progressively focuses on the samples found to have the highest uncertainty.
arXiv Detail & Related papers (2022-09-04T12:49:20Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Learning Stochastic Majority Votes by Minimizing a PAC-Bayes
Generalization Bound [15.557653926558638]
We investigate a counterpart of majority votes over finite ensembles of classifiers, and study its generalization properties.
We instantiate it with Dirichlet distributions: this allows for a closed-form and differentiable expression for the expected risk.
The resulting majority vote learning algorithm achieves state-of-the-art accuracy and benefits from (non-vacuous) tight bounds.
arXiv Detail & Related papers (2021-06-23T16:57:23Z) - Breadcrumbs: Adversarial Class-Balanced Sampling for Long-tailed
Recognition [95.93760490301395]
The problem of long-tailed recognition, where the number of examples per class is highly unbalanced, is considered.
It is hypothesized that this is due to the repeated sampling of examples and can be addressed by feature space augmentation.
A new feature augmentation strategy, EMANATE, based on back-tracking of features across epochs during training, is proposed.
A new sampling procedure, Breadcrumb, is then introduced to implement adversarial class-balanced sampling without extra computation.
arXiv Detail & Related papers (2021-05-01T00:21:26Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.