FCMI: Feature Correlation based Missing Data Imputation
- URL: http://arxiv.org/abs/2107.00100v1
- Date: Sat, 26 Jun 2021 13:35:33 GMT
- Title: FCMI: Feature Correlation based Missing Data Imputation
- Authors: Prateek Mishra, Kumar Divya Mani, Prashant Johri, Dikhsa Arya
- Abstract summary: We propose an efficient technique to impute the missing value in the dataset based on correlation called FCMI.
Our proposed algorithm picks the highly correlated attributes of the dataset and uses these attributes to build a regression model.
Experiments conducted on both classification and regression datasets show that the proposed imputation technique outperforms existing imputation algorithms.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Processed data are insightful, and crude data are obtuse. A serious threat to
data reliability is missing values. Such data leads to inaccurate analysis and
wrong predictions. We propose an efficient technique to impute the missing
value in the dataset based on correlation called FCMI (Feature Correlation
based Missing Data Imputation). We have considered the correlation of the
attributes of the dataset, and that is our central idea. Our proposed algorithm
picks the highly correlated attributes of the dataset and uses these attributes
to build a regression model whose parameters are optimized such that the
correlation of the dataset is maintained. Experiments conducted on both
classification and regression datasets show that the proposed imputation
technique outperforms existing imputation algorithms.
Related papers
- Approaching Metaheuristic Deep Learning Combos for Automated Data Mining [0.5419570023862531]
This work proposes a means of combining meta-heuristic methods with conventional classifiers and neural networks in order to perform automated data mining.
Experiments on the MNIST dataset for handwritten digit recognition were performed.
It was empirically observed that using a ground truth labeled dataset's validation accuracy is inadequate for correcting labels of other previously unseen data instances.
arXiv Detail & Related papers (2024-10-16T10:28:22Z) - Efficient adjustment for complex covariates: Gaining efficiency with
DOPE [56.537164957672715]
We propose a framework that accommodates adjustment for any subset of information expressed by the covariates.
Based on our theoretical results, we propose the Debiased Outcome-adapted Propensity Estorimator (DOPE) for efficient estimation of the average treatment effect (ATE)
Our results show that the DOPE provides an efficient and robust methodology for ATE estimation in various observational settings.
arXiv Detail & Related papers (2024-02-20T13:02:51Z) - Iterative missing value imputation based on feature importance [6.300806721275004]
We have designed an imputation method that considers feature importance.
This algorithm iteratively performs matrix completion and feature importance learning, and specifically, matrix completion is based on a filling loss that incorporates feature importance.
The results on these datasets consistently show that the proposed method outperforms the existing five imputation algorithms.
arXiv Detail & Related papers (2023-11-14T09:03:33Z) - TRIAGE: Characterizing and auditing training data for improved
regression [80.11415390605215]
We introduce TRIAGE, a novel data characterization framework tailored to regression tasks and compatible with a broad class of regressors.
TRIAGE utilizes conformal predictive distributions to provide a model-agnostic scoring method, the TRIAGE score.
We show that TRIAGE's characterization is consistent and highlight its utility to improve performance via data sculpting/filtering, in multiple regression settings.
arXiv Detail & Related papers (2023-10-29T10:31:59Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Machine Learning Based Missing Values Imputation in Categorical Datasets [2.5611256859404983]
This research looked into the use of machine learning algorithms to fill in the gaps in categorical datasets.
The emphasis was on ensemble models constructed using the Error Correction Output Codes framework.
Deep learning for missing data imputation has obstacles despite these encouraging results, including the requirement for large amounts of labeled data.
arXiv Detail & Related papers (2023-06-10T03:29:48Z) - Utility Assessment of Synthetic Data Generation Methods [0.0]
We investigate whether different methods of generating fully synthetic data vary in their utility a priori.
We find some methods to perform better than others across the board.
We do get promising findings for classification tasks when using synthetic data for training machine learning models.
arXiv Detail & Related papers (2022-11-23T11:09:52Z) - Rethinking Data Heterogeneity in Federated Learning: Introducing a New
Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants.
Our observations are intuitive.
Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z) - MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms [82.90843777097606]
We propose a causally-aware imputation algorithm (MIRACLE) for missing data.
MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism.
We conduct extensive experiments on synthetic and a variety of publicly available datasets to show that MIRACLE is able to consistently improve imputation.
arXiv Detail & Related papers (2021-11-04T22:38:18Z) - Doing Great at Estimating CATE? On the Neglected Assumptions in
Benchmark Comparisons of Treatment Effect Estimators [91.3755431537592]
We show that even in arguably the simplest setting, estimation under ignorability assumptions can be misleading.
We consider two popular machine learning benchmark datasets for evaluation of heterogeneous treatment effect estimators.
We highlight that the inherent characteristics of the benchmark datasets favor some algorithms over others.
arXiv Detail & Related papers (2021-07-28T13:21:27Z) - Missing Data Imputation using Optimal Transport [43.14084843713895]
We leverage optimal transport distances to quantify a criterion and turn it into a loss function to impute missing data values.
We propose practical methods to minimize these losses using end-to-end learning.
These experiments show that OT-based methods match or out-perform state-of-the-art imputation methods, even for high percentages of missing values.
arXiv Detail & Related papers (2020-02-10T15:23:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.