Related papers: Boosting, Voting Classifiers and Randomized Sample Compression Schemes

Boosting, Voting Classifiers and Randomized Sample Compression Schemes

URL: http://arxiv.org/abs/2402.02976v1
Date: Mon, 5 Feb 2024 12:58:03 GMT
Title: Boosting, Voting Classifiers and Randomized Sample Compression Schemes
Authors: Arthur da Cunha, Kasper Green Larsen, Martin Ritzert
Abstract summary: In boosting, we aim to leverage multiple weak learners to produce a strong learner. We propose a randomized boosting algorithm that outputs voting classifiers whose generalization error contains a single logarithmic dependency on the sample size.
Score: 14.885182312708196
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In boosting, we aim to leverage multiple weak learners to produce a strong learner. At the center of this paradigm lies the concept of building the strong learner as a voting classifier, which outputs a weighted majority vote of the weak learners. While many successful boosting algorithms, such as the iconic AdaBoost, produce voting classifiers, their theoretical performance has long remained sub-optimal: the best known bounds on the number of training examples necessary for a voting classifier to obtain a given accuracy has so far always contained at least two logarithmic factors above what is known to be achievable by general weak-to-strong learners. In this work, we break this barrier by proposing a randomized boosting algorithm that outputs voting classifiers whose generalization error contains a single logarithmic dependency on the sample size. We obtain this result by building a general framework that extends sample compression methods to support randomized learning algorithms based on sub-sampling.

Related papers

Sample-Efficient Agnostic Boosting [19.15484761265653]
Empirical Risk Minimization (ERM) outstrips the agnostic boosting methodology in being quadratically more sample efficient than all known boosting algorithms. A key feature of our algorithm is that it leverages the ability to reuse samples across multiple rounds of boosting, while guaranteeing a generalization error strictly better than those obtained by blackbox applications of uniform convergence arguments.
arXiv Detail & Related papers (2024-10-31T04:50:29Z)
The Many Faces of Optimal Weak-to-Strong Learning [10.985323882432086]
We present a new and surprisingly simple Boosting algorithm that obtains a provably optimal sample complexity. Our pilot empirical study suggests that our new algorithm might outperform previous algorithms on large data sets.
arXiv Detail & Related papers (2024-08-30T09:38:51Z)
Achieving More with Less: A Tensor-Optimization-Powered Ensemble Method [53.170053108447455]
Ensemble learning is a method that leverages weak learners to produce a strong learner. We design a smooth and convex objective function that leverages the concept of margin, making the strong learner more discriminative. We then compare our algorithm with random forests of ten times the size and other classical methods across numerous datasets.
arXiv Detail & Related papers (2024-08-06T03:42:38Z)
When Analytic Calculus Cracks AdaBoost Code [0.30693357740321775]
This study analyzes the (two classes) AdaBoost procedure implemented in scikit-learn. AdaBoost is an algorithm in name only, as the resulting combination of weak classifiers can be explicitly calculated using a truth table. We observe that this formula does not give the point of minimum of the risk, we provide a system to compute the exact point of minimum and we check that the AdaBoost procedure in scikit-learn does not implement the algorithm described by Freund and Schapire.
arXiv Detail & Related papers (2023-08-02T10:37:25Z)
Multiclass Boosting: Simple and Intuitive Weak Learning Criteria [72.71096438538254]
We give a simple and efficient boosting algorithm, that does not require realizability assumptions. We present a new result on boosting for list learners, as well as provide a novel proof for the characterization of multiclass PAC learning.
arXiv Detail & Related papers (2023-07-02T19:26:58Z)
AdaBoost is not an Optimal Weak to Strong Learner [11.003568749905359]
We show that the sample complexity of AdaBoost, and other classic variations thereof, are sub-optimal by at least one logarithmic factor in the desired accuracy of the strong learner.
arXiv Detail & Related papers (2023-01-27T07:37:51Z)
Confidence-aware Training of Smoothed Classifiers for Certified Robustness [75.95332266383417]
We use "accuracy under Gaussian noise" as an easy-to-compute proxy of adversarial robustness for an input. Our experiments show that the proposed method consistently exhibits improved certified robustness upon state-of-the-art training methods.
arXiv Detail & Related papers (2022-12-18T03:57:12Z)
ProBoost: a Boosting Method for Probabilistic Classifiers [55.970609838687864]
ProBoost is a new boosting algorithm for probabilistic classifiers. It uses the uncertainty of each training sample to determine the most challenging/uncertain ones. It produces a sequence that progressively focuses on the samples found to have the highest uncertainty.
arXiv Detail & Related papers (2022-09-04T12:49:20Z)
Breadcrumbs: Adversarial Class-Balanced Sampling for Long-tailed Recognition [95.93760490301395]
The problem of long-tailed recognition, where the number of examples per class is highly unbalanced, is considered. It is hypothesized that this is due to the repeated sampling of examples and can be addressed by feature space augmentation. A new feature augmentation strategy, EMANATE, based on back-tracking of features across epochs during training, is proposed. A new sampling procedure, Breadcrumb, is then introduced to implement adversarial class-balanced sampling without extra computation.
arXiv Detail & Related papers (2021-05-01T00:21:26Z)
Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks. We present a unifying view of randomized smoothing over arbitrary functions. We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.