Multiple imputation using chained random forests: a preliminary study
based on the empirical distribution of out-of-bag prediction errors
- URL: http://arxiv.org/abs/2004.14823v1
- Date: Thu, 30 Apr 2020 14:29:56 GMT
- Title: Multiple imputation using chained random forests: a preliminary study
based on the empirical distribution of out-of-bag prediction errors
- Authors: Shangzhi Hong, Yuqi Sun, Hanying Li, Henry S. Lynn
- Abstract summary: A novel RF-based multiple imputation method was proposed by constructing conditional distributions the empirical distribution of out-of-bag prediction errors.
The proposed non-parametric method can deliver valid multiple imputation results.
- Score: 0.716879432974126
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Missing data are common in data analyses in biomedical fields, and imputation
methods based on random forests (RF) have become widely accepted, as the RF
algorithm can achieve high accuracy without the need for specification of data
distributions or relationships. However, the predictions from RF do not contain
information about prediction uncertainty, which was unacceptable for multiple
imputation. Available RF-based multiple imputation methods tried to do proper
multiple imputation either by sampling directly from observations under
predicting nodes without accounting for the prediction error or by making
normality assumption about the prediction error distribution. In this study, a
novel RF-based multiple imputation method was proposed by constructing
conditional distributions the empirical distribution of out-of-bag prediction
errors. The proposed method was compared with previous method with parametric
assumptions about RF's prediction errors and predictive mean matching based on
simulation studies on data with presence of interaction term. The proposed
non-parametric method can deliver valid multiple imputation results. The
accompanying R package for this study is publicly available.
Related papers
- Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Quantification of Predictive Uncertainty via Inference-Time Sampling [57.749601811982096]
We propose a post-hoc sampling strategy for estimating predictive uncertainty accounting for data ambiguity.
The method can generate different plausible outputs for a given input and does not assume parametric forms of predictive distributions.
arXiv Detail & Related papers (2023-08-03T12:43:21Z) - Confidence and Uncertainty Assessment for Distributional Random Forests [1.2767281330110625]
The Distributional Random Forest (DRF) is a recently introduced Random Forest to estimate conditional distributions.
It can be employed to estimate a wide range of targets such as conditional average treatment effects, conditional quantiles, and conditional correlations.
We characterize the algorithm of DRF and develop a bootstrap approximation of it.
arXiv Detail & Related papers (2023-02-11T19:10:01Z) - A general framework for multi-step ahead adaptive conformal
heteroscedastic time series forecasting [0.0]
This paper introduces a novel model-agnostic algorithm called adaptive ensemble batch multi-input multi-output conformalized quantile regression (AEnbMIMOCQR)
It enables forecasters to generate multi-step ahead prediction intervals for a fixed pre-specified miscoverage rate in a distribution-free manner.
Our method is grounded on conformal prediction principles, however, it does not require data splitting and provides close to exact coverage even when the data is not exchangeable.
arXiv Detail & Related papers (2022-07-28T16:40:26Z) - NUQ: Nonparametric Uncertainty Quantification for Deterministic Neural
Networks [151.03112356092575]
We show the principled way to measure the uncertainty of predictions for a classifier based on Nadaraya-Watson's nonparametric estimate of the conditional label distribution.
We demonstrate the strong performance of the method in uncertainty estimation tasks on a variety of real-world image datasets.
arXiv Detail & Related papers (2022-02-07T12:30:45Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Posterior Network: Uncertainty Estimation without OOD Samples via
Density-Based Pseudo-Counts [33.45069308137142]
Posterior Network (PostNet) predicts an individual closed-form posterior distribution over predicted probabilites for any input sample.
PostNet achieves state-of-the art results in OOD detection and in uncertainty calibration under dataset shifts.
arXiv Detail & Related papers (2020-06-16T15:16:32Z) - Balance-Subsampled Stable Prediction [55.13512328954456]
We propose a novel balance-subsampled stable prediction (BSSP) algorithm based on the theory of fractional factorial design.
A design-theoretic analysis shows that the proposed method can reduce the confounding effects among predictors induced by the distribution shift.
Numerical experiments on both synthetic and real-world data sets demonstrate that our BSSP algorithm significantly outperforms the baseline methods for stable prediction across unknown test data.
arXiv Detail & Related papers (2020-06-08T07:01:38Z) - Learning to Predict Error for MRI Reconstruction [67.76632988696943]
We demonstrate that predictive uncertainty estimated by the current methods does not highly correlate with prediction error.
We propose a novel method that estimates the target labels and magnitude of the prediction error in two steps.
arXiv Detail & Related papers (2020-02-13T15:55:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.