On the Relation between Prediction and Imputation Accuracy under Missing
Covariates
- URL: http://arxiv.org/abs/2112.05248v1
- Date: Thu, 9 Dec 2021 23:30:44 GMT
- Title: On the Relation between Prediction and Imputation Accuracy under Missing
Covariates
- Authors: Burim Ramosaj, Justus Tulowietzki, Markus Pauly
- Abstract summary: Recent research has realized an increasing trend towards the usage of modern Machine Learning algorithms for imputation.
Recent research has realized an increasing trend towards the usage of modern Machine Learning algorithms for imputation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Missing covariates in regression or classification problems can prohibit the
direct use of advanced tools for further analysis. Recent research has realized
an increasing trend towards the usage of modern Machine Learning algorithms for
imputation. It originates from their capability of showing favourable
prediction accuracy in different learning problems. In this work, we analyze
through simulation the interaction between imputation accuracy and prediction
accuracy in regression learning problems with missing covariates when Machine
Learning based methods for both, imputation and prediction are used. In
addition, we explore imputation performance when using statistical inference
procedures in prediction settings, such as coverage rates of (valid) prediction
intervals. Our analysis is based on empirical datasets provided by the UCI
Machine Learning repository and an extensive simulation study.
Related papers
- Imputation for prediction: beware of diminishing returns [12.424671213282256]
Missing values are prevalent across various fields, posing challenges for training and deploying predictive models.
Recent theoretical and empirical studies indicate that simple constant imputation can be consistent and competitive.
This study aims at clarifying if and when investing in advanced imputation methods yields significantly better predictions.
arXiv Detail & Related papers (2024-07-29T09:01:06Z) - Distribution-free risk assessment of regression-based machine learning
algorithms [6.507711025292814]
We focus on regression algorithms and the risk-assessment task of computing the probability of the true label lying inside an interval defined around the model's prediction.
We solve the risk-assessment problem using the conformal prediction approach, which provides prediction intervals that are guaranteed to contain the true label with a given probability.
arXiv Detail & Related papers (2023-10-05T13:57:24Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Online machine-learning forecast uncertainty estimation for sequential
data assimilation [0.0]
Quantifying forecast uncertainty is a key aspect of state-of-the-art numerical weather prediction and data assimilation systems.
In this work a machine learning method is presented based on convolutional neural networks that estimates the state-dependent forecast uncertainty.
The hybrid data assimilation method shows similar performance to the ensemble Kalman filter outperforming it when the ensembles are relatively small.
arXiv Detail & Related papers (2023-05-12T19:23:21Z) - Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores.
We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z) - Prediction-Powered Inference [68.97619568620709]
Prediction-powered inference is a framework for performing valid statistical inference when an experimental dataset is supplemented with predictions from a machine-learning system.
The framework yields simple algorithms for computing provably valid confidence intervals for quantities such as means, quantiles, and linear and logistic regression coefficients.
Prediction-powered inference could enable researchers to draw valid and more data-efficient conclusions using machine learning.
arXiv Detail & Related papers (2023-01-23T18:59:28Z) - Scalable computation of prediction intervals for neural networks via
matrix sketching [79.44177623781043]
Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure.
This work proposes a new algorithm that can be applied to a given trained neural network and produces approximate prediction intervals.
arXiv Detail & Related papers (2022-05-06T13:18:31Z) - Automated Learning of Interpretable Models with Quantified Uncertainty [0.0]
We introduce a new framework for genetic-programming-based symbolic regression (GPSR)
GPSR uses model evidence to formulate replacement probability during the selection phase of evolution.
It is shown to increase interpretability, improve robustness to noise, and reduce overfitting when compared to a conventional GPSR implementation.
arXiv Detail & Related papers (2022-04-12T19:56:42Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.