Better Together: Cross and Joint Covariances Enhance Signal Detectability in Undersampled Data
- URL: http://arxiv.org/abs/2507.22207v1
- Date: Tue, 29 Jul 2025 20:11:02 GMT
- Title: Better Together: Cross and Joint Covariances Enhance Signal Detectability in Undersampled Data
- Authors: Arabind Swain, Sean Alexander Ridout, Ilya Nemenman,
- Abstract summary: Many data-science applications involve detecting a shared signal between two high-dimensional variables.<n>Using random matrix theory methods, we determine when such signal can be detected and reconstructed from sample correlations.<n>We discuss what these observations mean for choosing the right method for detecting linear correlations in data and how these findings may generalize to nonlinear statistical dependencies.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many data-science applications involve detecting a shared signal between two high-dimensional variables. Using random matrix theory methods, we determine when such signal can be detected and reconstructed from sample correlations, despite the background of sampling noise induced correlations. We consider three different covariance matrices constructed from two high-dimensional variables: their individual self covariance, their cross covariance, and the self covariance of the concatenated (joint) variable, which incorporates the self and the cross correlation blocks. We observe the expected Baik, Ben Arous, and P\'ech\'e detectability phase transition in all these covariance matrices, and we show that joint and cross covariance matrices always reconstruct the shared signal earlier than the self covariances. Whether the joint or the cross approach is better depends on the mismatch of dimensionalities between the variables. We discuss what these observations mean for choosing the right method for detecting linear correlations in data and how these findings may generalize to nonlinear statistical dependencies.
Related papers
- Towards Self-Supervised Covariance Estimation in Deep Heteroscedastic Regression [102.24287051757469]
We study self-supervised covariance estimation in deep heteroscedastic regression.<n>We derive an upper bound on the 2-Wasserstein distance between normal distributions.<n>Experiments over a wide range of synthetic and real datasets demonstrate that the proposed 2-Wasserstein bound coupled with pseudo label annotations results in a computationally cheaper yet accurate deep heteroscedastic regression.
arXiv Detail & Related papers (2025-02-14T22:37:11Z) - Causal vs. Anticausal merging of predictors [57.26526031579287]
We study the differences arising from merging predictors in the causal and anticausal directions using the same data.<n>We use Causal Maximum Entropy (CMAXENT) as inductive bias to merge the predictors.
arXiv Detail & Related papers (2025-01-14T20:38:15Z) - Causal Discovery on Dependent Binary Data [6.464898093190062]
We propose a decorrelation-based approach for causal graph learning on dependent binary data.<n>We develop an EM-like iterative algorithm to generate and decorrelate samples of the latent utility variables.<n>We demonstrate that the proposed decorrelation approach significantly improves the accuracy in causal graph learning.
arXiv Detail & Related papers (2024-12-28T21:55:42Z) - Relaxation Fluctuations of Correlation Functions: Spin and Random Matrix Models [0.0]
We study the fluctuation average and variance of certain correlation functions as a diagnostic measure of quantum chaos.<n>We identify the three distinct phases of the models: the ergodic, the fractal, and the localized phases.
arXiv Detail & Related papers (2024-07-31T14:45:46Z) - Linear causal disentanglement via higher-order cumulants [0.0]
We study the identifiability of linear causal disentanglement, assuming access to data under multiple contexts.
We show that one perfect intervention on each latent variable is sufficient and in the worst case necessary to recover parameters under perfect interventions.
arXiv Detail & Related papers (2024-07-05T15:53:16Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - TIC-TAC: A Framework for Improved Covariance Estimation in Deep Heteroscedastic Regression [109.69084997173196]
Deepscedastic regression involves jointly optimizing the mean and covariance of the predicted distribution using the negative log-likelihood.
Recent works show that this may result in sub-optimal convergence due to the challenges associated with covariance estimation.
We study two questions: (1) Does the predicted covariance truly capture the randomness of the predicted mean?
Our results show that not only does TIC accurately learn the covariance, it additionally facilitates an improved convergence of the negative log-likelihood.
arXiv Detail & Related papers (2023-10-29T09:54:03Z) - Score-based Causal Representation Learning with Interventions [54.735484409244386]
This paper studies the causal representation learning problem when latent causal variables are observed indirectly.
The objectives are: (i) recovering the unknown linear transformation (up to scaling) and (ii) determining the directed acyclic graph (DAG) underlying the latent variables.
arXiv Detail & Related papers (2023-01-19T18:39:48Z) - On Correlation Detection and Alignment Recovery of Gaussian Databases [5.33024001730262]
Correlation detection is a hypothesis testing problem; under the null hypothesis, the databases are independent, and under the alternate hypothesis, they are correlated.
We develop bounds on the type-I and type-II error probabilities, and show that the analyzed detector performs better than a recently proposed detector.
When the databases are accepted as correlated, the algorithm also recovers some partial alignment between the given databases.
arXiv Detail & Related papers (2022-11-02T12:01:42Z) - On the Strong Correlation Between Model Invariance and Generalization [54.812786542023325]
Generalization captures a model's ability to classify unseen data.
Invariance measures consistency of model predictions on transformations of the data.
From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets.
arXiv Detail & Related papers (2022-07-14T17:08:25Z) - Equivariance Discovery by Learned Parameter-Sharing [153.41877129746223]
We study how to discover interpretable equivariances from data.
Specifically, we formulate this discovery process as an optimization problem over a model's parameter-sharing schemes.
Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme.
arXiv Detail & Related papers (2022-04-07T17:59:19Z) - Sparse Covariance Estimation in Logit Mixture Models [0.0]
This paper introduces a new data-driven methodology for estimating sparse covariance matrices of the random coefficients in logit mixture models.
Our objective is to find optimal subsets of correlated coefficients for which we estimate covariances.
arXiv Detail & Related papers (2020-01-14T20:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.