Combining Predictions under Uncertainty: The Case of Random Decision
Trees
- URL: http://arxiv.org/abs/2208.07403v1
- Date: Mon, 15 Aug 2022 18:36:57 GMT
- Title: Combining Predictions under Uncertainty: The Case of Random Decision
Trees
- Authors: Florian Busch, Moritz Kulessa, Eneldo Loza Menc\'ia and Hendrik
Blockeel
- Abstract summary: A common approach to aggregate classification estimates in an ensemble of decision trees is to either use voting or to average the probabilities for each class.
In this paper, we investigate a number of alternative prediction methods.
Our methods are inspired by the theories of probability, belief functions and reliable classification.
- Score: 2.322689362836168
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A common approach to aggregate classification estimates in an ensemble of
decision trees is to either use voting or to average the probabilities for each
class. The latter takes uncertainty into account, but not the reliability of
the uncertainty estimates (so to say, the "uncertainty about the uncertainty").
More generally, much remains unknown about how to best combine probabilistic
estimates from multiple sources. In this paper, we investigate a number of
alternative prediction methods. Our methods are inspired by the theories of
probability, belief functions and reliable classification, as well as a
principle that we call evidence accumulation. Our experiments on a variety of
data sets are based on random decision trees which guarantees a high diversity
in the predictions to be combined. Somewhat unexpectedly, we found that taking
the average over the probabilities is actually hard to beat. However, evidence
accumulation showed consistently better results on all but very small leafs.
Related papers
- Treeffuser: Probabilistic Predictions via Conditional Diffusions with Gradient-Boosted Trees [39.9546129327526]
Treeffuser is an easy-to-use method for probabilistic prediction on tabular data.
Treeffuser learns well-calibrated predictive distributions and can handle a wide range of regression tasks.
We demonstrate its versatility with an application to inventory allocation under uncertainty using sales data from Walmart.
arXiv Detail & Related papers (2024-06-11T18:59:24Z) - Uncertainty Estimates of Predictions via a General Bias-Variance
Decomposition [7.811916700683125]
We introduce a bias-variance decomposition for proper scores, giving rise to the Bregman Information as the variance term.
We showcase the practical relevance of this decomposition on several downstream tasks, including model ensembles and confidence regions.
arXiv Detail & Related papers (2022-10-21T21:24:37Z) - Reconciling Individual Probability Forecasts [78.0074061846588]
We show that two parties who agree on the data cannot disagree on how to model individual probabilities.
We conclude that although individual probabilities are unknowable, they are contestable via a computationally and data efficient process.
arXiv Detail & Related papers (2022-09-04T20:20:35Z) - Dense Uncertainty Estimation via an Ensemble-based Conditional Latent
Variable Model [68.34559610536614]
We argue that the aleatoric uncertainty is an inherent attribute of the data and can only be correctly estimated with an unbiased oracle model.
We propose a new sampling and selection strategy at train time to approximate the oracle model for aleatoric uncertainty estimation.
Our results show that our solution achieves both accurate deterministic results and reliable uncertainty estimation.
arXiv Detail & Related papers (2021-11-22T08:54:10Z) - Dense Uncertainty Estimation [62.23555922631451]
In this paper, we investigate neural networks and uncertainty estimation techniques to achieve both accurate deterministic prediction and reliable uncertainty estimation.
We work on two types of uncertainty estimations solutions, namely ensemble based methods and generative model based methods, and explain their pros and cons while using them in fully/semi/weakly-supervised framework.
arXiv Detail & Related papers (2021-10-13T01:23:48Z) - Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution.
Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z) - How to Evaluate Uncertainty Estimates in Machine Learning for
Regression? [1.4610038284393165]
We show that both approaches to evaluating the quality of uncertainty estimates have serious flaws.
Firstly, both approaches cannot disentangle the separate components that jointly create the predictive uncertainty.
Thirdly, the current approach to test prediction intervals directly has additional flaws.
arXiv Detail & Related papers (2021-06-07T07:47:46Z) - Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic
Regression [51.770998056563094]
Probabilistic Gradient Boosting Machines (PGBM) is a method to create probabilistic predictions with a single ensemble of decision trees.
We empirically demonstrate the advantages of PGBM compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-03T08:32:13Z) - Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware
Regression [91.3373131262391]
Uncertainty is the only certainty there is.
Traditionally, the direct regression formulation is considered and the uncertainty is modeled by modifying the output space to a certain family of probabilistic distributions.
How to model the uncertainty within the present-day technologies for regression remains an open issue.
arXiv Detail & Related papers (2021-03-25T06:56:09Z) - Knowing what you know: valid and validated confidence sets in multiclass
and multilabel prediction [0.8594140167290097]
We develop conformal prediction methods for constructing valid confidence sets in multiclass and multilabel problems.
By leveraging ideas from quantile regression, we build methods that always guarantee correct coverage but additionally provide conditional coverage for both multiclass and multilabel prediction problems.
arXiv Detail & Related papers (2020-04-21T17:45:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.