Simplifying Random Forests' Probabilistic Forecasts
- URL: http://arxiv.org/abs/2408.12332v3
- Date: Sat, 12 Oct 2024 14:25:56 GMT
- Title: Simplifying Random Forests' Probabilistic Forecasts
- Authors: Nils Koster, Fabian Krüger,
- Abstract summary: Random Forests (RFs) have proven to be useful for both classification and regression tasks.
In this paper, we consider simplifying RF-based forecast distributions by sparsifying them.
It can be applied to any forecasting task without re-training existing RF models.
- Score: 1.534667887016089
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since their introduction by Breiman, Random Forests (RFs) have proven to be useful for both classification and regression tasks. The RF prediction of a previously unseen observation can be represented as a weighted sum of all training sample observations. This nearest-neighbor-type representation is useful, among other things, for constructing forecast distributions (Meinshausen, 2006). In this paper, we consider simplifying RF-based forecast distributions by sparsifying them. That is, we focus on a small subset of nearest neighbors while setting the remaining weights to zero. This sparsification step greatly improves the interpretability of RF predictions. It can be applied to any forecasting task without re-training existing RF models. In empirical experiments, we document that the simplified predictions can be similar to or exceed the original ones in terms of forecasting performance. We explore the statistical sources of this finding via a stylized analytical model of RFs. The model suggests that simplification is particularly promising if the unknown true forecast distribution contains many small weights that are estimated imprecisely.
Related papers
- Improving probabilistic forecasts of extreme wind speeds by training statistical post-processing models with weighted scoring rules [0.0]
Training using the threshold-weighted continuous ranked probability score (twCRPS) leads to improved extreme event performance of post-processing models.
We find a distribution body-tail trade-off where improved performance for probabilistic predictions of extreme events comes with worse performance for predictions of the distribution body.
arXiv Detail & Related papers (2024-07-22T11:07:52Z) - Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise [51.87307904567702]
Quantile regression is a leading approach for obtaining such intervals via the empirical estimation of quantiles in the distribution of outputs.
We propose Relaxed Quantile Regression (RQR), a direct alternative to quantile regression based interval construction that removes this arbitrary constraint.
We demonstrate that this added flexibility results in intervals with an improvement in desirable qualities.
arXiv Detail & Related papers (2024-06-05T13:36:38Z) - Generating Synthetic Ground Truth Distributions for Multi-step Trajectory Prediction using Probabilistic Composite Bézier Curves [4.837320865223374]
This paper proposes a novel approach to synthetic dataset generation based on composite probabilistic B'ezier curves.
The paper showcases an exemplary trajectory prediction model evaluation using generated ground truth distribution data.
arXiv Detail & Related papers (2024-04-05T20:50:06Z) - Nearest Neighbour Score Estimators for Diffusion Generative Models [16.189734871742743]
We introduce a novel nearest neighbour score function estimator which utilizes multiple samples from the training set to dramatically decrease estimator variance.
In diffusion models, we show that our estimator can replace a learned network for probability-flow ODE integration, opening promising new avenues of future research.
arXiv Detail & Related papers (2024-02-12T19:27:30Z) - Enhanced Local Explainability and Trust Scores with Random Forest Proximities [0.9423257767158634]
We exploit the fact that any random forest (RF) regression and classification model can be mathematically formulated as an adaptive weighted K nearest-neighbors model.
We show that this linearity facilitates a local notion of explainability of RF predictions that generates attributions for any model prediction across observations in the training set.
We show how this proximity-based approach to explainability can be used in conjunction with SHAP to explain not just the model predictions, but also out-of-sample performance.
arXiv Detail & Related papers (2023-10-19T02:42:20Z) - When Rigidity Hurts: Soft Consistency Regularization for Probabilistic
Hierarchical Time Series Forecasting [69.30930115236228]
Probabilistic hierarchical time-series forecasting is an important variant of time-series forecasting.
Most methods focus on point predictions and do not provide well-calibrated probabilistic forecasts distributions.
We propose PROFHiT, a fully probabilistic hierarchical forecasting model that jointly models forecast distribution of entire hierarchy.
arXiv Detail & Related papers (2023-10-17T20:30:16Z) - When Rigidity Hurts: Soft Consistency Regularization for Probabilistic
Hierarchical Time Series Forecasting [69.30930115236228]
Probabilistic hierarchical time-series forecasting is an important variant of time-series forecasting.
Most methods focus on point predictions and do not provide well-calibrated probabilistic forecasts distributions.
We propose PROFHiT, a fully probabilistic hierarchical forecasting model that jointly models forecast distribution of entire hierarchy.
arXiv Detail & Related papers (2022-06-16T06:13:53Z) - Uncertainty estimation of pedestrian future trajectory using Bayesian
approximation [137.00426219455116]
Under dynamic traffic scenarios, planning based on deterministic predictions is not trustworthy.
The authors propose to quantify uncertainty during forecasting using approximation which deterministic approaches fail to capture.
The effect of dropout weights and long-term prediction on future state uncertainty has been studied.
arXiv Detail & Related papers (2022-05-04T04:23:38Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Distributional Gradient Boosting Machines [77.34726150561087]
Our framework is based on XGBoost and LightGBM.
We show that our framework achieves state-of-the-art forecast accuracy.
arXiv Detail & Related papers (2022-04-02T06:32:19Z) - Multiple imputation using chained random forests: a preliminary study
based on the empirical distribution of out-of-bag prediction errors [0.716879432974126]
A novel RF-based multiple imputation method was proposed by constructing conditional distributions the empirical distribution of out-of-bag prediction errors.
The proposed non-parametric method can deliver valid multiple imputation results.
arXiv Detail & Related papers (2020-04-30T14:29:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.