Related papers: What's a good imputation to predict with missing values?

What's a good imputation to predict with missing values?

URL: http://arxiv.org/abs/2106.00311v1
Date: Tue, 1 Jun 2021 08:40:30 GMT
Title: What's a good imputation to predict with missing values?
Authors: Marine Le Morvan (PARIETAL, IJCLab), Julie Josse (CRISAM), Erwan Scornet (CMAP), Ga\"el Varoquaux (PARIETAL)
Abstract summary: We show that for almost all imputation functions, an impute-then-regress procedure with a powerful learner is Bayes optimal. We propose such a procedure, adapting NeuMiss, a neural network capturing the conditional links across observed and unobserved variables.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: How to learn a good predictor on data with missing values? Most efforts focus on first imputing as well as possible and second learning on the completed data to predict the outcome. Yet, this widespread practice has no theoretical grounding. Here we show that for almost all imputation functions, an impute-then-regress procedure with a powerful learner is Bayes optimal. This result holds for all missing-values mechanisms, in contrast with the classic statistical results that require missing-at-random settings to use imputation in probabilistic modeling. Moreover, it implies that perfect conditional imputation may not be needed for good prediction asymptotically. In fact, we show that on perfectly imputed data the best regression function will generally be discontinuous, which makes it hard to learn. Crafting instead the imputation so as to leave the regression function unchanged simply shifts the problem to learning discontinuous imputations. Rather, we suggest that it is easier to learn imputation and regression jointly. We propose such a procedure, adapting NeuMiss, a neural network capturing the conditional links across observed and unobserved variables whatever the missing-value pattern. Experiments confirm that joint imputation and regression through NeuMiss is better than various two step procedures in our experiments with finite number of samples.

Related papers

Test Set Sizing for the Ridge Regression [55.2480439325792]
This is the first time that such a split is calculated mathematically for a machine learning model in the large data limit. The goal of the calculations is to maximize "integrity," so that the measured error in the trained model is as close as possible to what it theoretically should be.
arXiv Detail & Related papers (2025-04-27T13:17:18Z)
Generalized Regression with Conditional GANs [2.4171019220503402]
We propose to learn a prediction function whose outputs, when paired with the corresponding inputs, are indistinguishable from feature-label pairs in the training dataset. We show that this approach to regression makes fewer assumptions on the distribution of the data we are fitting to and, therefore, has better representation capabilities.
arXiv Detail & Related papers (2024-04-21T01:27:47Z)
Enhancing Consistency and Mitigating Bias: A Data Replay Approach for Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks. To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks. However, it is not expected in practice considering the memory constraint or data privacy issue. As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z)
Deep Regression Unlearning [6.884272840652062]
We introduce deep regression unlearning methods that generalize well and are robust to privacy attacks. We conduct regression unlearning experiments for computer vision, natural language processing and forecasting applications.
arXiv Detail & Related papers (2022-10-15T05:00:20Z)
PROMISSING: Pruning Missing Values in Neural Networks [0.0]
We propose a simple and intuitive yet effective method for pruning missing values (PROMISSING) during learning and inference steps in neural networks. Our experiments show that PROMISSING results in similar prediction performance compared to various imputation techniques.
arXiv Detail & Related papers (2022-06-03T15:37:27Z)
Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation. We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation. Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates [68.09049111171862]
This work focuses on quantifying, reducing and analyzing regression errors in the NLP model updates. We formulate the regression-free model updates into a constrained optimization problem. We empirically analyze how model ensemble reduces regression.
arXiv Detail & Related papers (2021-05-07T03:33:00Z)
Variational Bayesian Unlearning [54.26984662139516]
We study the problem of approximately unlearning a Bayesian model from a small subset of the training data to be erased. We show that it is equivalent to minimizing an evidence upper bound which trades off between fully unlearning from erased data vs. not entirely forgetting the posterior belief. In model training with VI, only an approximate (instead of exact) posterior belief given the full data can be obtained, which makes unlearning even more challenging.
arXiv Detail & Related papers (2020-10-24T11:53:00Z)
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift. We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness. The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
Deep transformation models: Tackling complex regression problems with neural network based transformation models [0.0]
We present a deep transformation model for probabilistic regression. It estimates the whole conditional probability distribution, which is the most thorough way to capture uncertainty about the outcome. Our method works for complex input data, which we demonstrate by employing a CNN architecture on image data.
arXiv Detail & Related papers (2020-04-01T14:23:12Z)
On the consistency of supervised learning with missing values [15.666860186278782]
In many application settings, the data have missing entries which make analysis challenging. Here, we consider supervised-learning settings: predicting a target when missing values appear in both training and testing data. We show that the widely-used method of imputing with a constant, such as the mean prior to learning, is consistent when missing values are not informative.
arXiv Detail & Related papers (2019-02-19T07:27:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.