Correlation visualization under missing values: a comparison between
imputation and direct parameter estimation methods
- URL: http://arxiv.org/abs/2305.06044v2
- Date: Tue, 5 Sep 2023 09:33:43 GMT
- Title: Correlation visualization under missing values: a comparison between
imputation and direct parameter estimation methods
- Authors: Nhat-Hao Pham, Khanh-Linh Vo, Mai Anh Vu, Thu Nguyen, Michael A.
Riegler, P{\aa}l Halvorsen, Binh T. Nguyen
- Abstract summary: We compare the effects of various missing data methods on the correlation plot, focusing on two common missing patterns: random and monotone.
We recommend using DPER, a direct parameter estimation approach, for plotting the correlation matrix based on its performance in the experiments.
- Score: 4.963490281438653
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Correlation matrix visualization is essential for understanding the
relationships between variables in a dataset, but missing data can pose a
significant challenge in estimating correlation coefficients. In this paper, we
compare the effects of various missing data methods on the correlation plot,
focusing on two common missing patterns: random and monotone. We aim to provide
practical strategies and recommendations for researchers and practitioners in
creating and analyzing the correlation plot. Our experimental results suggest
that while imputation is commonly used for missing data, using imputed data for
plotting the correlation matrix may lead to a significantly misleading
inference of the relation between the features. We recommend using DPER, a
direct parameter estimation approach, for plotting the correlation matrix based
on its performance in the experiments.
Related papers
- Induced Covariance for Causal Discovery in Linear Sparse Structures [55.2480439325792]
Causal models seek to unravel the cause-effect relationships among variables from observed data.
This paper introduces a novel causal discovery algorithm designed for settings in which variables exhibit linearly sparse relationships.
arXiv Detail & Related papers (2024-10-02T04:01:38Z) - FACTS: First Amplify Correlations and Then Slice to Discover Bias [17.244153084361102]
Computer vision datasets frequently contain spurious correlations between task-relevant labels and (easy to learn) latent task-irrelevant attributes.
Models trained on such datasets learn "shortcuts" and underperform on bias-conflicting slices of data where the correlation does not hold.
We propose First Amplify Correlations and Then Slice to Discover Bias to inform downstream bias mitigation strategies.
arXiv Detail & Related papers (2023-09-29T17:41:26Z) - Fair Canonical Correlation Analysis [14.206538828733507]
Canonical Correlation Analysis (CCA) is a widely used technique for examining the relationship between two sets of variables.
We present a framework that alleviates unfairness by minimizing the correlation disparity error associated with protected attributes.
arXiv Detail & Related papers (2023-09-27T17:34:13Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - Causal Effect Estimation from Observational and Interventional Data
Through Matrix Weighted Linear Estimators [11.384045395629123]
We study causal effect estimation from a mixture of observational and interventional data.
We show that the statistical efficiency in terms of expected squared error can be improved by combining estimators.
arXiv Detail & Related papers (2023-06-09T16:16:53Z) - Learning Partial Correlation based Deep Visual Representation for Image
Classification [61.0532370259644]
We formulate sparse inverse covariance estimation (SICE) as a novel structured layer of CNN.
Our work obtains a partial correlation based deep visual representation and mitigates the small sample problem.
Experiments show the efficacy and superior classification performance of our model.
arXiv Detail & Related papers (2023-04-23T10:09:01Z) - Riemannian classification of EEG signals with missing values [67.90148548467762]
This paper proposes two strategies to handle missing data for the classification of electroencephalograms.
The first approach estimates the covariance from imputed data with the $k$-nearest neighbors algorithm; the second relies on the observed data by leveraging the observed-data likelihood within an expectation-maximization algorithm.
As results show, the proposed strategies perform better than the classification based on observed data and allow to keep a high accuracy even when the missing data ratio increases.
arXiv Detail & Related papers (2021-10-19T14:24:50Z) - Efficient Causal Inference from Combined Observational and
Interventional Data through Causal Reductions [68.6505592770171]
Unobserved confounding is one of the main challenges when estimating causal effects.
We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders.
We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
arXiv Detail & Related papers (2021-03-08T14:29:07Z) - Beyond Marginal Uncertainty: How Accurately can Bayesian Regression
Models Estimate Posterior Predictive Correlations? [13.127549105535623]
It is often more useful to estimate predictive correlations between the function values at different input locations.
We first consider a downstream task which depends on posterior predictive correlations: transductive active learning (TAL)
Since TAL is too expensive and indirect to guide development of algorithms, we introduce two metrics which more directly evaluate the predictive correlations.
arXiv Detail & Related papers (2020-11-06T03:48:59Z) - On Disentangled Representations Learned From Correlated Data [59.41587388303554]
We bridge the gap to real-world scenarios by analyzing the behavior of the most prominent disentanglement approaches on correlated data.
We show that systematically induced correlations in the dataset are being learned and reflected in the latent representations.
We also demonstrate how to resolve these latent correlations, either using weak supervision during training or by post-hoc correcting a pre-trained model with a small number of labels.
arXiv Detail & Related papers (2020-06-14T12:47:34Z) - Bayesian Sparse Covariance Structure Analysis for Correlated Count Data [3.867363075280544]
We assume a Gaussian Graphical Model for the latent variables which dominate the potential risks of crimes.
We apply the proposed model for estimation of the sparse inverse covariance of the latent variable and evaluate the partial correlation coefficients.
arXiv Detail & Related papers (2020-06-05T05:34:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.