NeuMiss networks: differentiable programming for supervised learning
with missing values
- URL: http://arxiv.org/abs/2007.01627v4
- Date: Wed, 4 Nov 2020 15:39:04 GMT
- Title: NeuMiss networks: differentiable programming for supervised learning
with missing values
- Authors: Marine Le Morvan (PARIETAL, IJCLab), Julie Josse (CMAP, XPOP), Thomas
Moreau (PARIETAL), Erwan Scornet (CMAP), Ga\"el Varoquaux (PARIETAL, MILA)
- Abstract summary: We derive the analytical form of the optimal predictor under a linearity assumption.
We propose a new principled architecture, named NeuMiss networks.
They have good predictive accuracy with both a number of parameters and a computational complexity independent of the number of missing data patterns.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The presence of missing values makes supervised learning much more
challenging. Indeed, previous work has shown that even when the response is a
linear function of the complete data, the optimal predictor is a complex
function of the observed entries and the missingness indicator. As a result,
the computational or sample complexities of consistent approaches depend on the
number of missing patterns, which can be exponential in the number of
dimensions. In this work, we derive the analytical form of the optimal
predictor under a linearity assumption and various missing data mechanisms
including Missing at Random (MAR) and self-masking (Missing Not At Random).
Based on a Neumann-series approximation of the optimal predictor, we propose a
new principled architecture, named NeuMiss networks. Their originality and
strength come from the use of a new type of non-linearity: the multiplication
by the missingness indicator. We provide an upper bound on the Bayes risk of
NeuMiss networks, and show that they have good predictive accuracy with both a
number of parameters and a computational complexity independent of the number
of missing data patterns. As a result they scale well to problems with many
features, and remain statistically efficient for medium-sized samples.
Moreover, we show that, contrary to procedures using EM or imputation, they are
robust to the missing data mechanism, including difficult MNAR settings such as
self-masking.
Related papers
- Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - MISNN: Multiple Imputation via Semi-parametric Neural Networks [9.594714330925703]
Multiple imputation (MI) has been widely applied to missing value problems in biomedical, social and econometric research.
We propose MISNN, a novel and efficient algorithm that incorporates feature selection for MI.
arXiv Detail & Related papers (2023-05-02T21:45:36Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Minimax rate of consistency for linear models with missing values [0.0]
Missing values arise in most real-world data sets due to the aggregation of multiple sources and intrinsically missing information (sensor failure, unanswered questions in surveys...).
In this paper, we focus on the extensively-studied linear models, but in presence of missing values, which turns out to be quite a challenging task.
This eventually requires to solve a number of learning tasks, exponential in the number of input features, which makes predictions impossible for current real-world datasets.
arXiv Detail & Related papers (2022-02-03T08:45:34Z) - MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms [82.90843777097606]
We propose a causally-aware imputation algorithm (MIRACLE) for missing data.
MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism.
We conduct extensive experiments on synthetic and a variety of publicly available datasets to show that MIRACLE is able to consistently improve imputation.
arXiv Detail & Related papers (2021-11-04T22:38:18Z) - Generalization of Neural Combinatorial Solvers Through the Lens of
Adversarial Robustness [68.97830259849086]
Most datasets only capture a simpler subproblem and likely suffer from spurious features.
We study adversarial robustness - a local generalization property - to reveal hard, model-specific instances and spurious features.
Unlike in other applications, where perturbation models are designed around subjective notions of imperceptibility, our perturbation models are efficient and sound.
Surprisingly, with such perturbations, a sufficiently expressive neural solver does not suffer from the limitations of the accuracy-robustness trade-off common in supervised learning.
arXiv Detail & Related papers (2021-10-21T07:28:11Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - Bayesian neural networks and dimensionality reduction [4.039245878626346]
A class of model-based approaches for such problems includes latent variables in an unknown non-linear regression function.
VAEs are artificial neural networks (ANNs) that employ approximations to make computation tractable.
We deploy Markov chain Monte Carlo sampling algorithms for Bayesian inference in ANN models with latent variables.
arXiv Detail & Related papers (2020-08-18T17:11:07Z) - Bayesian System ID: Optimal management of parameter, model, and
measurement uncertainty [0.0]
We evaluate the robustness of a probabilistic formulation of system identification (ID) to sparse, noisy, and indirect data.
We show that the log posterior has improved geometric properties compared with the objective function surfaces of traditional methods.
arXiv Detail & Related papers (2020-03-04T22:48:30Z) - Linear predictor on linearly-generated data with missing values: non
consistency and solutions [0.0]
We study the seemingly-simple case where the target to predict is a linear function of the fully-observed data.
We show that, in the presence of missing values, the optimal predictor may not be linear.
arXiv Detail & Related papers (2020-02-03T11:49:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.