Related papers: Combining Predictions under Uncertainty: The Case of Random Decision Trees

Combining Predictions under Uncertainty: The Case of Random Decision Trees

URL: http://arxiv.org/abs/2208.07403v1
Date: Mon, 15 Aug 2022 18:36:57 GMT
Title: Combining Predictions under Uncertainty: The Case of Random Decision Trees
Authors: Florian Busch, Moritz Kulessa, Eneldo Loza Menc\'ia and Hendrik Blockeel
Abstract summary: A common approach to aggregate classification estimates in an ensemble of decision trees is to either use voting or to average the probabilities for each class. In this paper, we investigate a number of alternative prediction methods. Our methods are inspired by the theories of probability, belief functions and reliable classification.
Score: 2.322689362836168
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A common approach to aggregate classification estimates in an ensemble of decision trees is to either use voting or to average the probabilities for each class. The latter takes uncertainty into account, but not the reliability of the uncertainty estimates (so to say, the "uncertainty about the uncertainty"). More generally, much remains unknown about how to best combine probabilistic estimates from multiple sources. In this paper, we investigate a number of alternative prediction methods. Our methods are inspired by the theories of probability, belief functions and reliable classification, as well as a principle that we call evidence accumulation. Our experiments on a variety of data sets are based on random decision trees which guarantees a high diversity in the predictions to be combined. Somewhat unexpectedly, we found that taking the average over the probabilities is actually hard to beat. However, evidence accumulation showed consistently better results on all but very small leafs.

Related papers

Treeffuser: Probabilistic Predictions via Conditional Diffusions with Gradient-Boosted Trees [39.9546129327526]
Treeffuser is an easy-to-use method for probabilistic prediction on tabular data. Treeffuser learns well-calibrated predictive distributions and can handle a wide range of regression tasks. We demonstrate its versatility with an application to inventory allocation under uncertainty using sales data from Walmart.
arXiv Detail & Related papers (2024-06-11T18:59:24Z)
Uncertainty Estimates of Predictions via a General Bias-Variance Decomposition [7.811916700683125]
We introduce a bias-variance decomposition for proper scores, giving rise to the Bregman Information as the variance term. We showcase the practical relevance of this decomposition on several downstream tasks, including model ensembles and confidence regions.
arXiv Detail & Related papers (2022-10-21T21:24:37Z)
Reconciling Individual Probability Forecasts [78.0074061846588]
We show that two parties who agree on the data cannot disagree on how to model individual probabilities. We conclude that although individual probabilities are unknowable, they are contestable via a computationally and data efficient process.
arXiv Detail & Related papers (2022-09-04T20:20:35Z)
Dense Uncertainty Estimation via an Ensemble-based Conditional Latent Variable Model [68.34559610536614]
We argue that the aleatoric uncertainty is an inherent attribute of the data and can only be correctly estimated with an unbiased oracle model. We propose a new sampling and selection strategy at train time to approximate the oracle model for aleatoric uncertainty estimation. Our results show that our solution achieves both accurate deterministic results and reliable uncertainty estimation.
arXiv Detail & Related papers (2021-11-22T08:54:10Z)
Dense Uncertainty Estimation [62.23555922631451]
In this paper, we investigate neural networks and uncertainty estimation techniques to achieve both accurate deterministic prediction and reliable uncertainty estimation. We work on two types of uncertainty estimations solutions, namely ensemble based methods and generative model based methods, and explain their pros and cons while using them in fully/semi/weakly-supervised framework.
arXiv Detail & Related papers (2021-10-13T01:23:48Z)
Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution. Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z)
How to Evaluate Uncertainty Estimates in Machine Learning for Regression? [1.4610038284393165]
We show that both approaches to evaluating the quality of uncertainty estimates have serious flaws. Firstly, both approaches cannot disentangle the separate components that jointly create the predictive uncertainty. Thirdly, the current approach to test prediction intervals directly has additional flaws.
arXiv Detail & Related papers (2021-06-07T07:47:46Z)
Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic Regression [51.770998056563094]
Probabilistic Gradient Boosting Machines (PGBM) is a method to create probabilistic predictions with a single ensemble of decision trees. We empirically demonstrate the advantages of PGBM compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-03T08:32:13Z)
Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression [91.3373131262391]
Uncertainty is the only certainty there is. Traditionally, the direct regression formulation is considered and the uncertainty is modeled by modifying the output space to a certain family of probabilistic distributions. How to model the uncertainty within the present-day technologies for regression remains an open issue.
arXiv Detail & Related papers (2021-03-25T06:56:09Z)
Knowing what you know: valid and validated confidence sets in multiclass and multilabel prediction [0.8594140167290097]
We develop conformal prediction methods for constructing valid confidence sets in multiclass and multilabel problems. By leveraging ideas from quantile regression, we build methods that always guarantee correct coverage but additionally provide conditional coverage for both multiclass and multilabel prediction problems.
arXiv Detail & Related papers (2020-04-21T17:45:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.