Prediction with Incomplete Data under Agnostic Mask Distribution Shift
- URL: http://arxiv.org/abs/2305.11197v1
- Date: Thu, 18 May 2023 14:06:06 GMT
- Title: Prediction with Incomplete Data under Agnostic Mask Distribution Shift
- Authors: Yichen Zhu, Jian Yuan, Bo Jiang, Tao Lin, Haiming Jin, Xinbing Wang,
Chenghu Zhou
- Abstract summary: We consider prediction with incomplete data in the presence of distribution shift.
We leverage the observation that for each mask, there is an invariant optimal predictor.
We propose a novel prediction method called StableMiss.
- Score: 35.86200694774949
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data with missing values is ubiquitous in many applications. Recent years
have witnessed increasing attention on prediction with only incomplete data
consisting of observed features and a mask that indicates the missing pattern.
Existing methods assume that the training and testing distributions are the
same, which may be violated in real-world scenarios. In this paper, we consider
prediction with incomplete data in the presence of distribution shift. We focus
on the case where the underlying joint distribution of complete features and
label is invariant, but the missing pattern, i.e., mask distribution may shift
agnostically between training and testing. To achieve generalization, we
leverage the observation that for each mask, there is an invariant optimal
predictor. To avoid the exponential explosion when learning them separately, we
approximate the optimal predictors jointly using a double parameterization
technique. This has the undesirable side effect of allowing the learned
predictors to rely on the intra-mask correlation and that between features and
mask. We perform decorrelation to minimize this effect. Combining the
techniques above, we propose a novel prediction method called StableMiss.
Extensive experiments on both synthetic and real-world datasets show that
StableMiss is robust and outperforms state-of-the-art methods under agnostic
mask distribution shift.
Related papers
- Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - Boosted Control Functions [10.503777692702952]
This work aims to bridge the gap between causal effect estimation and prediction tasks.
We establish a novel connection between the field of distribution from machine learning, and simultaneous equation models and control function from econometrics.
Within this framework, we propose a strong notion of invariance for a predictive model and compare it with existing (weaker) versions.
arXiv Detail & Related papers (2023-10-09T15:43:46Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Coherent False Seizure Prediction in Epilepsy, Coincidence or
Providence? [0.2770822269241973]
Seizure forecasting using machine learning is possible, but the performance is far from ideal.
Here, we examine false and missing alarms of two algorithms on long-term datasets.
arXiv Detail & Related papers (2021-10-26T10:25:14Z) - Training on Test Data with Bayesian Adaptation for Covariate Shift [96.3250517412545]
Deep neural networks often make inaccurate predictions with unreliable uncertainty estimates.
We derive a Bayesian model that provides for a well-defined relationship between unlabeled inputs under distributional shift and model parameters.
We show that our method improves both accuracy and uncertainty estimation.
arXiv Detail & Related papers (2021-09-27T01:09:08Z) - Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions.
We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts.
We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z) - Variance-reduced Language Pretraining via a Mask Proposal Network [5.819397109258169]
Self-supervised learning, a.k.a., pretraining, is important in natural language processing.
In this paper, we tackle the problem from the view of gradient variance reduction.
To improve efficiency, we introduced a MAsk Network (MAPNet), which approximates the optimal mask proposal distribution.
arXiv Detail & Related papers (2020-08-12T14:12:32Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z) - Balance-Subsampled Stable Prediction [55.13512328954456]
We propose a novel balance-subsampled stable prediction (BSSP) algorithm based on the theory of fractional factorial design.
A design-theoretic analysis shows that the proposed method can reduce the confounding effects among predictors induced by the distribution shift.
Numerical experiments on both synthetic and real-world data sets demonstrate that our BSSP algorithm significantly outperforms the baseline methods for stable prediction across unknown test data.
arXiv Detail & Related papers (2020-06-08T07:01:38Z) - Masking schemes for universal marginalisers [1.0412114420493723]
We consider the effect of structure-agnostic and structure-dependent masking schemes when training a universal marginaliser.
We compare networks trained with different masking schemes in terms of their predictive performance and generalisation properties.
arXiv Detail & Related papers (2020-01-16T15:35:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.