Related papers: No Imputation of Missing Values In Tabular Data Classification Using Incremental Learning

No Imputation of Missing Values In Tabular Data Classification Using Incremental Learning

URL: http://arxiv.org/abs/2504.14610v1
Date: Sun, 20 Apr 2025 13:31:49 GMT
Title: No Imputation of Missing Values In Tabular Data Classification Using Incremental Learning
Authors: Manar D. Samad, Kazi Fuad B. Akhter, Shourav B. Rabbani, Ibna Kowsar,
Abstract summary: This paper proposes no imputation incremental learning (NIIL) of tabular data with varying missing value rates and types.<n>The proposed method incrementally learns partitions of overlapping feature sets while using attention masks to exclude missing values from attention scoring.<n>Experiments substantiate the robustness of NIIL against varying missing value types and rates compared to methods that involve the imputation of missing values.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Tabular data sets with varying missing values are prepared for machine learning using an arbitrary imputation strategy. Synthetic values generated by imputation models often concern data stakeholders about computational complexity, data quality, and data-driven outcomes. This paper eliminates these concerns by proposing no imputation incremental learning (NIIL) of tabular data with varying missing value rates and types. The proposed method incrementally learns partitions of overlapping feature sets while using attention masks to exclude missing values from attention scoring. The average classification performance rank order across 15 diverse tabular data sets highlights the superiority of NIIL over 11 state-of-the-art learning methods with or without missing value imputations. Further experiments substantiate the robustness of NIIL against varying missing value types and rates compared to methods that involve the imputation of missing values. Our empirical analysis reveals that a feature partition size of half of the original feature space is, computation-wise and accuracy-wise, the best choice for the proposed incremental learning. The proposed method is one of the first deep learning solutions that can effectively learn tabular data without requiring the imputation of missing values.

Related papers

DeepIFSAC: Deep Imputation of Missing Values Using Feature and Sample Attention within Contrastive Framework [0.0]
Most commonly used statistical and machine learning methods for missing value imputation may be ineffective when the missing rate is high and not random.<n>This paper explores row and column attention in tabular data as between-feature and between-sample attention in a novel framework to reconstruct missing values.<n>The proposed method uses CutMix data augmentation within a contrastive learning framework to improve the uncertainty of missing value estimation.
arXiv Detail & Related papers (2025-01-19T01:10:18Z)
Capturing the Temporal Dependence of Training Data Influence [100.91355498124527]
We formalize the concept of trajectory-specific leave-one-out influence, which quantifies the impact of removing a data point during training.<n>We propose data value embedding, a novel technique enabling efficient approximation of trajectory-specific LOO.<n>As data value embedding captures training data ordering, it offers valuable insights into model training dynamics.
arXiv Detail & Related papers (2024-12-12T18:28:55Z)
An End-to-End Model for Time Series Classification In the Presence of Missing Values [25.129396459385873]
Time series classification with missing data is a prevalent issue in time series analysis. This study proposes an end-to-end neural network that unifies data imputation and representation learning within a single framework.
arXiv Detail & Related papers (2024-08-11T19:39:12Z)
Incremental Self-training for Semi-supervised Learning [56.57057576885672]
IST is simple yet effective and fits existing self-training-based semi-supervised learning methods. We verify the proposed IST on five datasets and two types of backbone, effectively improving the recognition accuracy and learning speed.
arXiv Detail & Related papers (2024-04-14T05:02:00Z)
Iterative missing value imputation based on feature importance [6.300806721275004]
We have designed an imputation method that considers feature importance. This algorithm iteratively performs matrix completion and feature importance learning, and specifically, matrix completion is based on a filling loss that incorporates feature importance. The results on these datasets consistently show that the proposed method outperforms the existing five imputation algorithms.
arXiv Detail & Related papers (2023-11-14T09:03:33Z)
XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners [71.8257151788923]
We propose a novel Explainable Active Learning framework (XAL) for low-resource text classification. XAL encourages classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations. Experiments on six datasets show that XAL achieves consistent improvement over 9 strong baselines.
arXiv Detail & Related papers (2023-10-09T08:07:04Z)
LAVA: Data Valuation without Pre-Specified Learning Algorithms [20.578106028270607]
We introduce a new framework that can value training data in a way that is oblivious to the downstream learning algorithm. We develop a proxy for the validation performance associated with a training set based on a non-conventional class-wise Wasserstein distance between training and validation sets. We show that the distance characterizes the upper bound of the validation performance for any given model under certain Lipschitz conditions.
arXiv Detail & Related papers (2023-04-28T19:05:16Z)
Adaptive Negative Evidential Deep Learning for Open-set Semi-supervised Learning [69.81438976273866]
Open-set semi-supervised learning (Open-set SSL) considers a more practical scenario, where unlabeled data and test data contain new categories (outliers) not observed in labeled data (inliers) We introduce evidential deep learning (EDL) as an outlier detector to quantify different types of uncertainty, and design different uncertainty metrics for self-training and inference. We propose a novel adaptive negative optimization strategy, making EDL more tailored to the unlabeled dataset containing both inliers and outliers.
arXiv Detail & Related papers (2023-03-21T09:07:15Z)
Imputation of missing values in multi-view data [0.24739484546803336]
We introduce a new imputation method based on the existing stacked penalized logistic regression algorithm for multi-view learning. We compare the performance of the new imputation method with several existing imputation algorithms in simulated data sets and a real data application.
arXiv Detail & Related papers (2022-10-26T05:19:30Z)
Leachable Component Clustering [10.377914682543903]
In this work, a novel approach to clustering of incomplete data, termed leachable component clustering, is proposed. The proposed method handles data imputation with Bayes alignment, and collects the lost patterns in theory. Experiments on several artificial incomplete data sets demonstrate that, the proposed method is able to present superior performance compared with other state-of-the-art algorithms.
arXiv Detail & Related papers (2022-08-28T13:13:17Z)
Continual Learning For On-Device Environmental Sound Classification [63.81276321857279]
We propose a simple and efficient continual learning method for on-device environmental sound classification. Our method selects the historical data for the training by measuring the per-sample classification uncertainty.
arXiv Detail & Related papers (2022-07-15T12:13:04Z)
Evaluating representations by the complexity of learning low-loss predictors [55.94170724668857]
We consider the problem of evaluating representations of data for use in solving a downstream task. We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on a task of interest.
arXiv Detail & Related papers (2020-09-15T22:06:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.