MAIN: Multihead-Attention Imputation Networks
- URL: http://arxiv.org/abs/2102.05428v1
- Date: Wed, 10 Feb 2021 13:50:02 GMT
- Title: MAIN: Multihead-Attention Imputation Networks
- Authors: Spyridon Mouselinos, Kyriakos Polymenakos, Antonis Nikitakis,
Konstantinos Kyriakopoulos
- Abstract summary: We propose a novel mechanism based on multi-head attention which can be applied effortlessly in any model.
Our method inductively models patterns of missingness in the input data in order to increase the performance of the downstream task.
- Score: 4.427447378048202
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The problem of missing data, usually absent incurated and
competition-standard datasets, is an unfortunate reality for most machine
learning models used in industry applications. Recent work has focused on
understanding the nature and the negative effects of such phenomena, while
devising solutions for optimal imputation of the missing data, using both
discriminative and generative approaches. We propose a novel mechanism based on
multi-head attention which can be applied effortlessly in any model and
achieves better downstream performance without the introduction of the full
dataset in any part of the modeling pipeline. Our method inductively models
patterns of missingness in the input data in order to increase the performance
of the downstream task. Finally, after evaluating our method against baselines
for a number of datasets, we found performance gains that tend to be larger in
scenarios of high missingness.
Related papers
- The Data Addition Dilemma [4.869513274920574]
In many machine learning for healthcare tasks, standard datasets are constructed by amassing data across many, often fundamentally dissimilar, sources.
But when does adding more data help, and when does it hinder progress on desired model outcomes in real-world settings?
We identify this situation as the textitData Addition Dilemma, demonstrating that adding training data in this multi-source scaling context can at times result in reduced overall accuracy, uncertain fairness outcomes, and reduced worst-subgroup performance.
arXiv Detail & Related papers (2024-08-08T01:42:31Z) - Debiasing Multimodal Models via Causal Information Minimization [65.23982806840182]
We study bias arising from confounders in a causal graph for multimodal data.
Robust predictive features contain diverse information that helps a model generalize to out-of-distribution data.
We use these features as confounder representations and use them via methods motivated by causal theory to remove bias from models.
arXiv Detail & Related papers (2023-11-28T16:46:14Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Unsupervised Disentanglement without Autoencoding: Pitfalls and Future
Directions [21.035001142156464]
Disentangled visual representations have largely been studied with generative models such as Variational AutoEncoders (VAEs)
We explore regularization methods with contrastive learning, which could result in disentangled representations powerful enough for large scale datasets and downstream applications.
We evaluate disentanglement with downstream tasks, analyze the benefits and disadvantages of each regularization used, and discuss future directions.
arXiv Detail & Related papers (2021-08-14T21:06:42Z) - End-to-End Weak Supervision [15.125993628007972]
We propose an end-to-end approach for directly learning the downstream model.
We show improved performance over prior work in terms of end model performance on downstream test sets.
arXiv Detail & Related papers (2021-07-05T19:10:11Z) - Exploring the Efficacy of Automatically Generated Counterfactuals for
Sentiment Analysis [17.811597734603144]
We propose an approach to automatically generating counterfactual data for data augmentation and explanation.
A comprehensive evaluation on several different datasets and using a variety of state-of-the-art benchmarks demonstrate how our approach can achieve significant improvements in model performance.
arXiv Detail & Related papers (2021-06-29T10:27:01Z) - Training Deep Normalizing Flow Models in Highly Incomplete Data
Scenarios with Prior Regularization [13.985534521589257]
We propose a novel framework to facilitate the learning of data distributions in high paucity scenarios.
The proposed framework naturally stems from posing the process of learning from incomplete data as a joint optimization task.
arXiv Detail & Related papers (2021-04-03T20:57:57Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z) - Adversarial Filters of Dataset Biases [96.090959788952]
Large neural models have demonstrated human-level performance on language and vision benchmarks.
Their performance degrades considerably on adversarial or out-of-distribution samples.
We propose AFLite, which adversarially filters such dataset biases.
arXiv Detail & Related papers (2020-02-10T21:59:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.