Linear predictor on linearly-generated data with missing values: non
consistency and solutions
- URL: http://arxiv.org/abs/2002.00658v2
- Date: Tue, 12 May 2020 16:48:12 GMT
- Title: Linear predictor on linearly-generated data with missing values: non
consistency and solutions
- Authors: Marine Le Morvan (PARIETAL, IJCLab), Nicolas Prost (CMAP, XPOP), Julie
Josse (CMAP, XPOP), Erwan Scornet (CMAP), Ga\"el Varoquaux (PARIETAL, MILA)
- Abstract summary: We study the seemingly-simple case where the target to predict is a linear function of the fully-observed data.
We show that, in the presence of missing values, the optimal predictor may not be linear.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider building predictors when the data have missing values. We study
the seemingly-simple case where the target to predict is a linear function of
the fully-observed data and we show that, in the presence of missing values,
the optimal predictor may not be linear. In the particular Gaussian case, it
can be written as a linear function of multiway interactions between the
observed data and the various missing-value indicators. Due to its intrinsic
complexity, we study a simple approximation and prove generalization bounds
with finite samples, highlighting regimes for which each method performs best.
We then show that multilayer perceptrons with ReLU activation functions can be
consistent, and can explore good trade-offs between the true model and
approximations. Our study highlights the interesting family of models that are
beneficial to fit with missing values depending on the amount of data
available.
Related papers
- Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Functional Nonlinear Learning [0.0]
We propose a functional nonlinear learning (FunNoL) method to represent multivariate functional data in a lower-dimensional feature space.
We show that FunNoL provides satisfactory curve classification and reconstruction regardless of data sparsity.
arXiv Detail & Related papers (2022-06-22T23:47:45Z) - Optimal regularizations for data generation with probabilistic graphical
models [0.0]
Empirically, well-chosen regularization schemes dramatically improve the quality of the inferred models.
We consider the particular case of L 2 and L 1 regularizations in the Maximum A Posteriori (MAP) inference of generative pairwise graphical models.
arXiv Detail & Related papers (2021-12-02T14:45:16Z) - Harmless interpolation in regression and classification with structured
features [21.064512161584872]
Overparametrized neural networks tend to perfectly fit noisy training data yet generalize well on test data.
We present a general and flexible framework for upper bounding regression and classification risk in a reproducing kernel Hilbert space.
arXiv Detail & Related papers (2021-11-09T15:12:26Z) - On Optimal Interpolation In Linear Regression [22.310861786709538]
We show that the optimal way to interpolate in linear regression is to use functions that are linear in the response variable.
We identify a regime where the minimum-norm interpolator provably generalizes arbitrarily worse than the optimal response-linear achievable interpolator.
We extend the notion of optimal response-linear to random features regression under a linear data-generating model.
arXiv Detail & Related papers (2021-10-21T16:37:10Z) - OR-Net: Pointwise Relational Inference for Data Completion under Partial
Observation [51.083573770706636]
This work uses relational inference to fill in the incomplete data.
We propose Omni-Relational Network (OR-Net) to model the pointwise relativity in two aspects.
arXiv Detail & Related papers (2021-05-02T06:05:54Z) - NeuMiss networks: differentiable programming for supervised learning
with missing values [0.0]
We derive the analytical form of the optimal predictor under a linearity assumption.
We propose a new principled architecture, named NeuMiss networks.
They have good predictive accuracy with both a number of parameters and a computational complexity independent of the number of missing data patterns.
arXiv Detail & Related papers (2020-07-03T11:42:25Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Bayesian Sparse Factor Analysis with Kernelized Observations [67.60224656603823]
Multi-view problems can be faced with latent variable models.
High-dimensionality and non-linear issues are traditionally handled by kernel methods.
We propose merging both approaches into single model.
arXiv Detail & Related papers (2020-06-01T14:25:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.