Prediction Instability in Machine Learning Ensembles
- URL: http://arxiv.org/abs/2407.03194v5
- Date: Mon, 26 Aug 2024 16:57:34 GMT
- Title: Prediction Instability in Machine Learning Ensembles
- Authors: Jeremy Kedziora,
- Abstract summary: We prove a theorem that shows that any ensemble will exhibit at least one of the following forms of prediction instability.
It will either ignore agreement among all underlying models, change its mind when none of the underlying models have done so, or be manipulable through inclusion or exclusion of options it would never actually predict.
This analysis also sheds light on what specific forms of prediction instability to expect from particular ensemble algorithms.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In machine learning ensembles predictions from multiple models are aggregated. Despite widespread use and strong performance of ensembles in applied problems little is known about the mathematical properties of aggregating models and associated consequences for safe, explainable use of such models. In this paper we prove a theorem that shows that any ensemble will exhibit at least one of the following forms of prediction instability. It will either ignore agreement among all underlying models, change its mind when none of the underlying models have done so, or be manipulable through inclusion or exclusion of options it would never actually predict. As a consequence, ensemble aggregation procedures will always need to balance the benefits of information use against the risk of these prediction instabilities. This analysis also sheds light on what specific forms of prediction instability to expect from particular ensemble algorithms; for example popular tree ensembles like random forest, or xgboost will violate basic, intuitive fairness properties. Finally, we show that this can be ameliorated by using consistent models in asymptotic conditions.
Related papers
- Revisiting Optimism and Model Complexity in the Wake of Overparameterized Machine Learning [6.278498348219108]
We revisit model complexity from first principles, by first reinterpreting and then extending the classical statistical concept of (effective) degrees of freedom.
We demonstrate the utility of our proposed complexity measures through a mix of conceptual arguments, theory, and experiments.
arXiv Detail & Related papers (2024-10-02T06:09:57Z) - Conformal online model aggregation [29.43493007296859]
This paper proposes a new approach towards conformal model aggregation in online settings.
It is based on combining the prediction sets from several algorithms by voting, where weights on the models are adapted over time based on past performance.
arXiv Detail & Related papers (2024-03-22T15:40:06Z) - Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models.
We present theoretical results on the expected churn between models within the Rashomon set.
We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - Are Ensembles Getting Better all the Time? [24.442955229765957]
We show that ensembles are getting better all the time if, and only if, the considered loss function is convex.
We illustrate our results on a medical prediction (diagnosing melanomas using neural nets) and a "wisdom crowds" experiment (guessing the ratings of upcoming movies)
arXiv Detail & Related papers (2023-11-29T18:32:37Z) - Pathologies of Pre-trained Language Models in Few-shot Fine-tuning [50.3686606679048]
We show that pre-trained language models with few examples show strong prediction bias across labels.
Although few-shot fine-tuning can mitigate the prediction bias, our analysis shows models gain performance improvement by capturing non-task-related features.
These observations alert that pursuing model performance with fewer examples may incur pathological prediction behavior.
arXiv Detail & Related papers (2022-04-17T15:55:18Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Set Prediction without Imposing Structure as Conditional Density
Estimation [40.86881969839325]
We propose an alternative to training via set losses by viewing learning as conditional density estimation.
Our framework fits deep energy-based models and approximates the intractable likelihood with gradient-guided sampling.
Our approach is competitive with previous set prediction models on standard benchmarks.
arXiv Detail & Related papers (2020-10-08T16:49:16Z) - A Note on High-Probability versus In-Expectation Guarantees of
Generalization Bounds in Machine Learning [95.48744259567837]
Statistical machine learning theory often tries to give generalization guarantees of machine learning models.
Statements made about the performance of machine learning models have to take the sampling process into account.
We show how one may transform one statement to another.
arXiv Detail & Related papers (2020-10-06T09:41:35Z) - Universal time-series forecasting with mixture predictors [10.812772606528172]
This book is devoted to the problem of sequential probability forecasting, that is, predicting the probabilities of the next outcome of a growing sequence of observations given the past.
Main subject is the mixture predictors, which are formed as a combination of a finite or infinite set of other predictors.
Results demonstrate the universality of this method in a very general probabilistic setting, but also show some of its limitations.
arXiv Detail & Related papers (2020-10-01T10:56:23Z) - Deducing neighborhoods of classes from a fitted model [68.8204255655161]
In this article a new kind of interpretable machine learning method is presented.
It can help to understand the partitioning of the feature space into predicted classes in a classification model using quantile shifts.
Basically, real data points (or specific points of interest) are used and the changes of the prediction after slightly raising or decreasing specific features are observed.
arXiv Detail & Related papers (2020-09-11T16:35:53Z) - Ambiguity in Sequential Data: Predicting Uncertain Futures with
Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data.
We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.