Iterative missing value imputation based on feature importance
- URL: http://arxiv.org/abs/2311.08005v1
- Date: Tue, 14 Nov 2023 09:03:33 GMT
- Title: Iterative missing value imputation based on feature importance
- Authors: Cong Guo, Chun Liu, Wei Yang
- Abstract summary: We have designed an imputation method that considers feature importance.
This algorithm iteratively performs matrix completion and feature importance learning, and specifically, matrix completion is based on a filling loss that incorporates feature importance.
The results on these datasets consistently show that the proposed method outperforms the existing five imputation algorithms.
- Score: 6.300806721275004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many datasets suffer from missing values due to various reasons,which not
only increases the processing difficulty of related tasks but also reduces the
accuracy of classification. To address this problem, the mainstream approach is
to use missing value imputation to complete the dataset. Existing imputation
methods estimate the missing parts based on the observed values in the original
feature space, and they treat all features as equally important during data
completion, while in fact different features have different importance.
Therefore, we have designed an imputation method that considers feature
importance. This algorithm iteratively performs matrix completion and feature
importance learning, and specifically, matrix completion is based on a filling
loss that incorporates feature importance. Our experimental analysis involves
three types of datasets: synthetic datasets with different noisy features and
missing values, real-world datasets with artificially generated missing values,
and real-world datasets originally containing missing values. The results on
these datasets consistently show that the proposed method outperforms the
existing five imputation algorithms.To the best of our knowledge, this is the
first work that considers feature importance in the imputation model.
Related papers
- On the Performance of Imputation Techniques for Missing Values on Healthcare Datasets [0.0]
Missing values or data is one popular characteristic of real-world datasets, especially healthcare data.
This study is to compare the performance of seven imputation techniques, namely Mean imputation, Median Imputation, Last Observation carried Forward (LOCF) imputation, K-Nearest Neighbor (KNN) imputation, Interpolation imputation, Missforest imputation, and Multiple imputation by Chained Equations (MICE)
The results show that Missforest imputation performs the best followed by MICE imputation.
arXiv Detail & Related papers (2024-03-13T18:07:17Z) - A novel feature selection framework for incomplete data [0.904776731152113]
Existing methods complete the incomplete data and then conduct feature selection based on the imputed data.
Since imputation and feature selection are entirely independent steps, the importance of features cannot be considered during imputation.
We propose a novel incomplete data feature selection framework that considers feature importance.
arXiv Detail & Related papers (2023-12-07T09:45:14Z) - Transformed Distribution Matching for Missing Value Imputation [7.754689608872696]
Key to missing value imputation is to capture the data distribution with incomplete samples and impute the missing values accordingly.
In this paper, we propose to impute the missing values of two batches of data by transforming them into a latent space through deep invertible functions.
To learn the transformations and impute the missing values simultaneously, a simple and well-motivated algorithm is proposed.
arXiv Detail & Related papers (2023-02-20T23:44:30Z) - To Impute or not to Impute? -- Missing Data in Treatment Effect
Estimation [84.76186111434818]
We identify a new missingness mechanism, which we term mixed confounded missingness (MCM), where some missingness determines treatment selection and other missingness is determined by treatment selection.
We show that naively imputing all data leads to poor performing treatment effects models, as the act of imputation effectively removes information necessary to provide unbiased estimates.
Our solution is selective imputation, where we use insights from MCM to inform precisely which variables should be imputed and which should not.
arXiv Detail & Related papers (2022-02-04T12:08:31Z) - MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms [82.90843777097606]
We propose a causally-aware imputation algorithm (MIRACLE) for missing data.
MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism.
We conduct extensive experiments on synthetic and a variety of publicly available datasets to show that MIRACLE is able to consistently improve imputation.
arXiv Detail & Related papers (2021-11-04T22:38:18Z) - Doing Great at Estimating CATE? On the Neglected Assumptions in
Benchmark Comparisons of Treatment Effect Estimators [91.3755431537592]
We show that even in arguably the simplest setting, estimation under ignorability assumptions can be misleading.
We consider two popular machine learning benchmark datasets for evaluation of heterogeneous treatment effect estimators.
We highlight that the inherent characteristics of the benchmark datasets favor some algorithms over others.
arXiv Detail & Related papers (2021-07-28T13:21:27Z) - FCMI: Feature Correlation based Missing Data Imputation [0.0]
We propose an efficient technique to impute the missing value in the dataset based on correlation called FCMI.
Our proposed algorithm picks the highly correlated attributes of the dataset and uses these attributes to build a regression model.
Experiments conducted on both classification and regression datasets show that the proposed imputation technique outperforms existing imputation algorithms.
arXiv Detail & Related papers (2021-06-26T13:35:33Z) - Evaluating State-of-the-Art Classification Models Against Bayes
Optimality [106.50867011164584]
We show that we can compute the exact Bayes error of generative models learned using normalizing flows.
We use our approach to conduct a thorough investigation of state-of-the-art classification models.
arXiv Detail & Related papers (2021-06-07T06:21:20Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z) - Establishing strong imputation performance of a denoising autoencoder in
a wide range of missing data problems [0.0]
We develop a consistent framework for both training and imputation.
We benchmarked the results against state-of-the-art imputation methods.
The developed autoencoder obtained the smallest error for all ranges of initial data corruption.
arXiv Detail & Related papers (2020-04-06T12:00:30Z) - New advances in enumerative biclustering algorithms with online
partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets.
The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.