Sharing pattern submodels for prediction with missing values
- URL: http://arxiv.org/abs/2206.11161v3
- Date: Fri, 24 Nov 2023 13:50:56 GMT
- Title: Sharing pattern submodels for prediction with missing values
- Authors: Lena Stempfle, Ashkan Panahi, Fredrik D. Johansson
- Abstract summary: Missing values are unavoidable in many applications of machine learning and present challenges both during training and at test time.
We propose an alternative approach, called sharing pattern submodels, which i) makes predictions robust to missing values at test time, ii) maintains or improves the predictive power of pattern submodels andiii) has a short description, enabling improved interpretability.
- Score: 12.981974894538668
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Missing values are unavoidable in many applications of machine learning and
present challenges both during training and at test time. When variables are
missing in recurring patterns, fitting separate pattern submodels have been
proposed as a solution. However, fitting models independently does not make
efficient use of all available data. Conversely, fitting a single shared model
to the full data set relies on imputation which often leads to biased results
when missingness depends on unobserved factors. We propose an alternative
approach, called sharing pattern submodels, which i) makes predictions that are
robust to missing values at test time, ii) maintains or improves the predictive
power of pattern submodels, and iii) has a short description, enabling improved
interpretability. Parameter sharing is enforced through sparsity-inducing
regularization which we prove leads to consistent estimation. Finally, we give
conditions for when a sharing model is optimal, even when both missingness and
the target outcome depend on unobserved variables. Classification and
regression experiments on synthetic and real-world data sets demonstrate that
our models achieve a favorable tradeoff between pattern specialization and
information sharing.
Related papers
- MINTY: Rule-based Models that Minimize the Need for Imputing Features
with Missing Values [10.591844776850857]
MINTY is a method that learns rules in the form of disjunctions between variables that act as replacements for each other when one or more is missing.
We demonstrate the value of MINTY in experiments using synthetic and real-world data sets and find its predictive performance comparable or favorable to baselines.
arXiv Detail & Related papers (2023-11-23T17:09:12Z) - Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores.
We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z) - Investigating Ensemble Methods for Model Robustness Improvement of Text
Classifiers [66.36045164286854]
We analyze a set of existing bias features and demonstrate there is no single model that works best for all the cases.
By choosing an appropriate bias model, we can obtain a better robustness result than baselines with a more sophisticated model design.
arXiv Detail & Related papers (2022-10-28T17:52:10Z) - On the Impact of Temporal Concept Drift on Model Explanations [31.390397997989712]
Explanation faithfulness of model predictions in natural language processing is evaluated on held-out data from the same temporal distribution as the training data.
We examine the impact of temporal variation on model explanations extracted by eight feature attribution methods and three select-then-predict models across six text classification tasks.
arXiv Detail & Related papers (2022-10-17T15:53:09Z) - Uncertainty Estimation for Language Reward Models [5.33024001730262]
Language models can learn a range of capabilities from unsupervised training on text corpora.
It is often easier for humans to choose between options than to provide labeled data, and prior work has achieved state-of-the-art performance by training a reward model from such preference comparisons.
We seek to address these problems via uncertainty estimation, which can improve sample efficiency and robustness using active learning and risk-averse reinforcement learning.
arXiv Detail & Related papers (2022-03-14T20:13:21Z) - Missing Value Knockoffs [0.0]
A recently introduced framework, model-x knockoffs, provides that to a wide range of models but lacks support for datasets with missing values.
We show that posterior sampled imputation allows reusing existing knockoff samplers in the presence of missing values.
We also demonstrate how jointly imputing and sampling knockoffs can reduce the computational complexity.
arXiv Detail & Related papers (2022-02-26T04:05:31Z) - On the Efficacy of Adversarial Data Collection for Question Answering:
Results from a Large-Scale Randomized Study [65.17429512679695]
In adversarial data collection (ADC), a human workforce interacts with a model in real time, attempting to produce examples that elicit incorrect predictions.
Despite ADC's intuitive appeal, it remains unclear when training on adversarial datasets produces more robust models.
arXiv Detail & Related papers (2021-06-02T00:48:33Z) - Anomaly Detection of Time Series with Smoothness-Inducing Sequential
Variational Auto-Encoder [59.69303945834122]
We present a Smoothness-Inducing Sequential Variational Auto-Encoder (SISVAE) model for robust estimation and anomaly detection of time series.
Our model parameterizes mean and variance for each time-stamp with flexible neural networks.
We show the effectiveness of our model on both synthetic datasets and public real-world benchmarks.
arXiv Detail & Related papers (2021-02-02T06:15:15Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z) - not-MIWAE: Deep Generative Modelling with Missing not at Random Data [21.977065542645082]
We present an approach for building and fitting deep latent variable models (DLVMs) in cases where the missing process is dependent on the missing data.
Specifically, a deep neural network enables us to flexibly model the conditional distribution of the missingness pattern given the data.
We show on various kinds of data sets and missingness patterns that explicitly modelling the missing process can be invaluable.
arXiv Detail & Related papers (2020-06-23T10:06:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.