On the consistency of supervised learning with missing values
- URL: http://arxiv.org/abs/1902.06931v5
- Date: Thu, 21 Mar 2024 09:01:19 GMT
- Title: On the consistency of supervised learning with missing values
- Authors: Julie Josse, Jacob M. Chen, Nicolas Prost, Erwan Scornet, Gaƫl Varoquaux,
- Abstract summary: In many application settings, the data have missing entries which make analysis challenging.
Here, we consider supervised-learning settings: predicting a target when missing values appear in both training and testing data.
We show that the widely-used method of imputing with a constant, such as the mean prior to learning, is consistent when missing values are not informative.
- Score: 15.666860186278782
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many application settings, the data have missing entries which make analysis challenging. An abundant literature addresses missing values in an inferential framework: estimating parameters and their variance from incomplete tables. Here, we consider supervised-learning settings: predicting a target when missing values appear in both training and testing data. We show the consistency of two approaches in prediction. A striking result is that the widely-used method of imputing with a constant, such as the mean prior to learning is consistent when missing values are not informative. This contrasts with inferential settings where mean imputation is pointed at for distorting the distribution of the data. That such a simple approach can be consistent is important in practice. We also show that a predictor suited for complete observations can predict optimally on incomplete data, through multiple imputation. Finally, to compare imputation with learning directly with a model that accounts for missing values, we analyze further decision trees. These can naturally tackle empirical risk minimization with missing values, due to their ability to handle the half-discrete nature of incomplete variables. After comparing theoretically and empirically different missing values strategies in trees, we recommend using the "missing incorporated in attribute" method as it can handle both non-informative and informative missing values.
Related papers
- Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - Benchmarking missing-values approaches for predictive models on health
databases [47.187609203210705]
We conduct a benchmark of missing-values strategies in predictive models with a focus on large health databases.
We find that native support for missing values in supervised machine learning predicts better than state-of-the-art imputation with much less computational cost.
arXiv Detail & Related papers (2022-02-17T09:40:04Z) - Minimax rate of consistency for linear models with missing values [0.0]
Missing values arise in most real-world data sets due to the aggregation of multiple sources and intrinsically missing information (sensor failure, unanswered questions in surveys...).
In this paper, we focus on the extensively-studied linear models, but in presence of missing values, which turns out to be quite a challenging task.
This eventually requires to solve a number of learning tasks, exponential in the number of input features, which makes predictions impossible for current real-world datasets.
arXiv Detail & Related papers (2022-02-03T08:45:34Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Fairness without Imputation: A Decision Tree Approach for Fair
Prediction with Missing Values [4.973456986972679]
We investigate the fairness concerns of training a machine learning model using data with missing values.
We propose an integrated approach based on decision trees that does not require a separate process of imputation and learning.
We demonstrate that our approach outperforms existing fairness intervention methods applied to an imputed dataset.
arXiv Detail & Related papers (2021-09-21T20:46:22Z) - Greedy structure learning from data that contains systematic missing
values [13.088541054366527]
Learning from data that contain missing values represents a common phenomenon in many domains.
Relatively few Bayesian Network structure learning algorithms account for missing data.
This paper describes three variants of greedy search structure learning that utilise pairwise deletion and inverse probability weighting.
arXiv Detail & Related papers (2021-07-09T02:56:44Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - What's a good imputation to predict with missing values? [0.0]
We show that for almost all imputation functions, an impute-then-regress procedure with a powerful learner is Bayes optimal.
We propose such a procedure, adapting NeuMiss, a neural network capturing the conditional links across observed and unobserved variables.
arXiv Detail & Related papers (2021-06-01T08:40:30Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.