DeepIFSAC: Deep Imputation of Missing Values Using Feature and Sample Attention within Contrastive Framework
- URL: http://arxiv.org/abs/2501.10910v2
- Date: Wed, 05 Feb 2025 18:29:09 GMT
- Title: DeepIFSAC: Deep Imputation of Missing Values Using Feature and Sample Attention within Contrastive Framework
- Authors: Ibna Kowsar, Shourav B. Rabbani, Yina Hou, Manar D. Samad,
- Abstract summary: Existing missing value imputation methods are ineffective when the missing rate is high and not at random.
We develop a novel framework to reconstruct missing values using row and column attention as between-feature and between-sample attention.
We evaluate the quality of imputed data using real-world electronic health records with missing values.
- Score: 0.0
- License:
- Abstract: Missing values of varying patterns and rates in real-world tabular data pose a significant challenge in developing reliable data-driven models. Existing missing value imputation methods use statistical and traditional machine learning and are ineffective when the missing rate is high and not at random. This paper explores row and column attention in tabular data as between-feature and between-sample attention in a novel framework to reconstruct missing values. The proposed method uses the CutMix data augmentation within a contrastive learning framework to improve the uncertainty of missing value estimation. The performance and generalizability of trained imputation models are evaluated on set-aside test data folds with missing values. The proposed framework outperforms nine state-of-the-art imputation methods across several missing value types and rates (10\%-50\%) on a diverse selection of twelve tabular data sets. We evaluate the quality of imputed data using real-world electronic health records with missing values, demonstrating our proposed framework's superiority to state-of-the-art statistical, machine learning, and deep imputation methods. This paper highlights the heterogeneity of tabular data sets to recommend imputation methods based on missing value types and data characteristics.
Related papers
- Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks.
It is quite beneficial and challenging to detect poisoned samples from a mixed dataset.
We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z) - ITI-IQA: a Toolbox for Heterogeneous Univariate and Multivariate Missing Data Imputation Quality Assessment [0.0]
ITI-IQA is a set of utilities designed to assess the reliability of various imputation methods.
The toolbox also includes a suite of diagnosing methods and graphical tools to check measurements.
arXiv Detail & Related papers (2024-07-16T14:26:46Z) - Probabilistic Imputation for Time-series Classification with Missing
Data [17.956329906475084]
We propose a novel framework for classification with time series data with missing values.
Our deep generative model part is trained to impute the missing values in multiple plausible ways.
The classifier part takes the time series data along with the imputed missing values and classifies signals.
arXiv Detail & Related papers (2023-08-13T10:04:13Z) - Conservative Prediction via Data-Driven Confidence Minimization [70.93946578046003]
In safety-critical applications of machine learning, it is often desirable for a model to be conservative.
We propose the Data-Driven Confidence Minimization framework, which minimizes confidence on an uncertainty dataset.
arXiv Detail & Related papers (2023-06-08T07:05:36Z) - Adaptive Negative Evidential Deep Learning for Open-set Semi-supervised Learning [69.81438976273866]
Open-set semi-supervised learning (Open-set SSL) considers a more practical scenario, where unlabeled data and test data contain new categories (outliers) not observed in labeled data (inliers)
We introduce evidential deep learning (EDL) as an outlier detector to quantify different types of uncertainty, and design different uncertainty metrics for self-training and inference.
We propose a novel adaptive negative optimization strategy, making EDL more tailored to the unlabeled dataset containing both inliers and outliers.
arXiv Detail & Related papers (2023-03-21T09:07:15Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Deep Imputation of Missing Values in Time Series Health Data: A Review
with Benchmarking [0.0]
This survey performs six data-centric experiments to benchmark state-of-the-art deep imputation methods on five time series health data sets.
Deep learning methods that jointly perform cross-sectional (across variables) and longitudinal (across time) imputations of missing values in time series data yield statistically better data quality than traditional imputation methods.
arXiv Detail & Related papers (2023-02-10T16:03:36Z) - Diffusion models for missing value imputation in tabular data [10.599563005836066]
Missing value imputation in machine learning is the task of estimating the missing values in the dataset accurately using available information.
We propose a diffusion model approach called "Conditional Score-based Diffusion Models for Tabular data" (CSDI_T)
To effectively handle categorical variables and numerical variables simultaneously, we investigate three techniques: one-hot encoding, analog bits encoding, and feature tokenization.
arXiv Detail & Related papers (2022-10-31T08:13:26Z) - Benchmarking missing-values approaches for predictive models on health
databases [47.187609203210705]
We conduct a benchmark of missing-values strategies in predictive models with a focus on large health databases.
We find that native support for missing values in supervised machine learning predicts better than state-of-the-art imputation with much less computational cost.
arXiv Detail & Related papers (2022-02-17T09:40:04Z) - RIFLE: Imputation and Robust Inference from Low Order Marginals [10.082738539201804]
We develop a statistical inference framework for regression and classification in the presence of missing data without imputation.
Our framework, RIFLE, estimates low-order moments of the underlying data distribution with corresponding confidence intervals to learn a distributionally robust model.
Our experiments demonstrate that RIFLE outperforms other benchmark algorithms when the percentage of missing values is high and/or when the number of data points is relatively small.
arXiv Detail & Related papers (2021-09-01T23:17:30Z) - Self-Trained One-class Classification for Unsupervised Anomaly Detection [56.35424872736276]
Anomaly detection (AD) has various applications across domains, from manufacturing to healthcare.
In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples.
To tackle this problem, we build a robust one-class classification framework via data refinement.
We show that our method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.
arXiv Detail & Related papers (2021-06-11T01:36:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.