When are ensembles really effective?
- URL: http://arxiv.org/abs/2305.12313v1
- Date: Sun, 21 May 2023 01:36:25 GMT
- Title: When are ensembles really effective?
- Authors: Ryan Theisen, Hyunsuk Kim, Yaoqing Yang, Liam Hodgkinson, Michael W.
Mahoney
- Abstract summary: We study the question of when ensembling yields significant performance improvements in classification tasks.
We show that ensembling improves performance significantly whenever the disagreement rate is large relative to the average error rate.
We identify practical scenarios where ensembling does and does not result in large performance improvements.
- Score: 49.37269057899679
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ensembling has a long history in statistical data analysis, with many
impactful applications. However, in many modern machine learning settings, the
benefits of ensembling are less ubiquitous and less obvious. We study, both
theoretically and empirically, the fundamental question of when ensembling
yields significant performance improvements in classification tasks.
Theoretically, we prove new results relating the \emph{ensemble improvement
rate} (a measure of how much ensembling decreases the error rate versus a
single model, on a relative scale) to the \emph{disagreement-error ratio}. We
show that ensembling improves performance significantly whenever the
disagreement rate is large relative to the average error rate; and that,
conversely, one classifier is often enough whenever the disagreement rate is
low relative to the average error rate. On the way to proving these results, we
derive, under a mild condition called \emph{competence}, improved upper and
lower bounds on the average test error rate of the majority vote classifier. To
complement this theory, we study ensembling empirically in a variety of
settings, verifying the predictions made by our theory, and identifying
practical scenarios where ensembling does and does not result in large
performance improvements. Perhaps most notably, we demonstrate a distinct
difference in behavior between interpolating models (popular in current
practice) and non-interpolating models (such as tree-based methods, where
ensembling is popular), demonstrating that ensembling helps considerably more
in the latter case than in the former.
Related papers
- Robust Distributed Learning: Tight Error Bounds and Breakdown Point
under Data Heterogeneity [11.2120847961379]
We consider in this paper a more realistic heterogeneity model, namely (G,B)-gradient dissimilarity, and show that it covers a larger class of learning problems than existing theory.
We also prove a new lower bound on the learning error of any distributed learning algorithm.
arXiv Detail & Related papers (2023-09-24T09:29:28Z) - Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks.
The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data.
Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z) - Ensembling over Classifiers: a Bias-Variance Perspective [13.006468721874372]
We build upon the extension to the bias-variance decomposition by Pfau (2013) in order to gain crucial insights into the behavior of ensembles of classifiers.
We show that conditional estimates necessarily incur an irreducible error.
Empirically, standard ensembling reducesthe bias, leading us to hypothesize that ensembles of classifiers may perform well in part because of this unexpected reduction.
arXiv Detail & Related papers (2022-06-21T17:46:35Z) - Beyond spectral gap: The role of the topology in decentralized learning [58.48291921602417]
In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model.
This paper aims to paint an accurate picture of sparsely-connected distributed optimization when workers share the same data distribution.
Our theory matches empirical observations in deep learning, and accurately describes the relative merits of different graph topologies.
arXiv Detail & Related papers (2022-06-07T08:19:06Z) - Throwing Away Data Improves Worst-Class Error in Imbalanced
Classification [36.91428748713018]
Class imbalances pervade classification problems, yet their treatment differs in theory and practice.
We take on the challenge of developing learning theory able to describe the worst-class error of classifiers over linearly-separable data.
arXiv Detail & Related papers (2022-05-23T23:43:18Z) - Solving Inefficiency of Self-supervised Representation Learning [87.30876679780532]
Existing contrastive learning methods suffer from very low learning efficiency.
Under-clustering and over-clustering problems are major obstacles to learning efficiency.
We propose a novel self-supervised learning framework using a median triplet loss.
arXiv Detail & Related papers (2021-04-18T07:47:10Z) - On Counterfactual Explanations under Predictive Multiplicity [14.37676876556672]
Counterfactual explanations are usually obtained by identifying the smallest change made to an input to change a prediction made by a fixed model.
Recent work has revitalized an old insight: there often does not exist one superior solution to a prediction problem with respect to commonly used measures of interest.
arXiv Detail & Related papers (2020-06-23T16:25:47Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - An Investigation of Why Overparameterization Exacerbates Spurious
Correlations [98.3066727301239]
We identify two key properties of the training data that drive this behavior.
We show how the inductive bias of models towards "memorizing" fewer examples can cause over parameterization to hurt.
arXiv Detail & Related papers (2020-05-09T01:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.