No Imputation of Missing Values In Tabular Data Classification Using Incremental Learning
- URL: http://arxiv.org/abs/2504.14610v1
- Date: Sun, 20 Apr 2025 13:31:49 GMT
- Title: No Imputation of Missing Values In Tabular Data Classification Using Incremental Learning
- Authors: Manar D. Samad, Kazi Fuad B. Akhter, Shourav B. Rabbani, Ibna Kowsar,
- Abstract summary: This paper proposes no imputation incremental learning (NIIL) of tabular data with varying missing value rates and types.<n>The proposed method incrementally learns partitions of overlapping feature sets while using attention masks to exclude missing values from attention scoring.<n>Experiments substantiate the robustness of NIIL against varying missing value types and rates compared to methods that involve the imputation of missing values.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Tabular data sets with varying missing values are prepared for machine learning using an arbitrary imputation strategy. Synthetic values generated by imputation models often concern data stakeholders about computational complexity, data quality, and data-driven outcomes. This paper eliminates these concerns by proposing no imputation incremental learning (NIIL) of tabular data with varying missing value rates and types. The proposed method incrementally learns partitions of overlapping feature sets while using attention masks to exclude missing values from attention scoring. The average classification performance rank order across 15 diverse tabular data sets highlights the superiority of NIIL over 11 state-of-the-art learning methods with or without missing value imputations. Further experiments substantiate the robustness of NIIL against varying missing value types and rates compared to methods that involve the imputation of missing values. Our empirical analysis reveals that a feature partition size of half of the original feature space is, computation-wise and accuracy-wise, the best choice for the proposed incremental learning. The proposed method is one of the first deep learning solutions that can effectively learn tabular data without requiring the imputation of missing values.
Related papers
- Transductive Model Selection under Prior Probability Shift [49.56191463229252]
Transductive learning is a supervised machine learning task in which the unlabelled data that require labelling are a finite set and are available at training time.<n>We propose a method, tailored to transductive classification contexts, for performing model selection when the data exhibit prior probability shift.
arXiv Detail & Related papers (2025-07-30T13:03:24Z) - DeepIFSAC: Deep Imputation of Missing Values Using Feature and Sample Attention within Contrastive Framework [0.0]
Most commonly used statistical and machine learning methods for missing value imputation may be ineffective when the missing rate is high and not random.<n>This paper explores row and column attention in tabular data as between-feature and between-sample attention in a novel framework to reconstruct missing values.<n>The proposed method uses CutMix data augmentation within a contrastive learning framework to improve the uncertainty of missing value estimation.
arXiv Detail & Related papers (2025-01-19T01:10:18Z) - Capturing the Temporal Dependence of Training Data Influence [100.91355498124527]
We formalize the concept of trajectory-specific leave-one-out influence, which quantifies the impact of removing a data point during training.<n>We propose data value embedding, a novel technique enabling efficient approximation of trajectory-specific LOO.<n>As data value embedding captures training data ordering, it offers valuable insights into model training dynamics.
arXiv Detail & Related papers (2024-12-12T18:28:55Z) - An End-to-End Model for Time Series Classification In the Presence of Missing Values [25.129396459385873]
Time series classification with missing data is a prevalent issue in time series analysis.
This study proposes an end-to-end neural network that unifies data imputation and representation learning within a single framework.
arXiv Detail & Related papers (2024-08-11T19:39:12Z) - Not Another Imputation Method: A Transformer-based Model for Missing Values in Tabular Datasets [1.02138250640885]
"Not Another Imputation Method" (NAIM) is a transformer-based model designed to handle missing values without traditional imputation techniques.<n>NAIM's ability to avoid the necessity of imputing missing values and to effectively learn from available data relies on two main techniques.<n>We extensively evaluated NAIM on 5 publicly available datasets, demonstrating its superior performance over 6 state-of-the-art machine learning models and 5 deep learning models.
arXiv Detail & Related papers (2024-07-16T09:43:47Z) - CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning [101.81127587760831]
Current fine-tuning methods build adapters widely of the context of downstream task to learn, or the context of important knowledge to maintain.<n>We propose CorDA, a Context-oriented Decomposition Adaptation method that builds learnable task-aware adapters.<n>Our method enables two options, the knowledge-preserved adaptation and the instruction-previewed adaptation.
arXiv Detail & Related papers (2024-06-07T19:10:35Z) - Data Imputation by Pursuing Better Classification: A Supervised Kernel-Based Method [33.56136381435839]
We propose a new framework that effectively leverages supervision information to complete missing data in a manner conducive to classification.<n>Our algorithm significantly outperforms other methods when the data is missing more than 60% of the features.
arXiv Detail & Related papers (2024-05-13T14:44:02Z) - Incremental Self-training for Semi-supervised Learning [56.57057576885672]
IST is simple yet effective and fits existing self-training-based semi-supervised learning methods.
We verify the proposed IST on five datasets and two types of backbone, effectively improving the recognition accuracy and learning speed.
arXiv Detail & Related papers (2024-04-14T05:02:00Z) - Iterative missing value imputation based on feature importance [6.300806721275004]
We have designed an imputation method that considers feature importance.
This algorithm iteratively performs matrix completion and feature importance learning, and specifically, matrix completion is based on a filling loss that incorporates feature importance.
The results on these datasets consistently show that the proposed method outperforms the existing five imputation algorithms.
arXiv Detail & Related papers (2023-11-14T09:03:33Z) - XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners [71.8257151788923]
We propose a novel Explainable Active Learning framework (XAL) for low-resource text classification.
XAL encourages classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations.
Experiments on six datasets show that XAL achieves consistent improvement over 9 strong baselines.
arXiv Detail & Related papers (2023-10-09T08:07:04Z) - LAVA: Data Valuation without Pre-Specified Learning Algorithms [20.578106028270607]
We introduce a new framework that can value training data in a way that is oblivious to the downstream learning algorithm.
We develop a proxy for the validation performance associated with a training set based on a non-conventional class-wise Wasserstein distance between training and validation sets.
We show that the distance characterizes the upper bound of the validation performance for any given model under certain Lipschitz conditions.
arXiv Detail & Related papers (2023-04-28T19:05:16Z) - Adaptive Negative Evidential Deep Learning for Open-set Semi-supervised Learning [69.81438976273866]
Open-set semi-supervised learning (Open-set SSL) considers a more practical scenario, where unlabeled data and test data contain new categories (outliers) not observed in labeled data (inliers)
We introduce evidential deep learning (EDL) as an outlier detector to quantify different types of uncertainty, and design different uncertainty metrics for self-training and inference.
We propose a novel adaptive negative optimization strategy, making EDL more tailored to the unlabeled dataset containing both inliers and outliers.
arXiv Detail & Related papers (2023-03-21T09:07:15Z) - Imputation of missing values in multi-view data [0.24739484546803336]
We introduce a new imputation method based on the existing stacked penalized logistic regression algorithm for multi-view learning.
We compare the performance of the new imputation method with several existing imputation algorithms in simulated data sets and a real data application.
arXiv Detail & Related papers (2022-10-26T05:19:30Z) - Leachable Component Clustering [10.377914682543903]
In this work, a novel approach to clustering of incomplete data, termed leachable component clustering, is proposed.
The proposed method handles data imputation with Bayes alignment, and collects the lost patterns in theory.
Experiments on several artificial incomplete data sets demonstrate that, the proposed method is able to present superior performance compared with other state-of-the-art algorithms.
arXiv Detail & Related papers (2022-08-28T13:13:17Z) - Continual Learning For On-Device Environmental Sound Classification [63.81276321857279]
We propose a simple and efficient continual learning method for on-device environmental sound classification.
Our method selects the historical data for the training by measuring the per-sample classification uncertainty.
arXiv Detail & Related papers (2022-07-15T12:13:04Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Evaluating representations by the complexity of learning low-loss
predictors [55.94170724668857]
We consider the problem of evaluating representations of data for use in solving a downstream task.
We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on a task of interest.
arXiv Detail & Related papers (2020-09-15T22:06:58Z) - Establishing strong imputation performance of a denoising autoencoder in
a wide range of missing data problems [0.0]
We develop a consistent framework for both training and imputation.
We benchmarked the results against state-of-the-art imputation methods.
The developed autoencoder obtained the smallest error for all ranges of initial data corruption.
arXiv Detail & Related papers (2020-04-06T12:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.