Permutation Inference for Canonical Correlation Analysis
- URL: http://arxiv.org/abs/2002.10046v4
- Date: Thu, 18 Jun 2020 01:15:58 GMT
- Title: Permutation Inference for Canonical Correlation Analysis
- Authors: Anderson M. Winkler, Olivier Renaud, Stephen M. Smith, Thomas E.
Nichols
- Abstract summary: We show that a simple permutation test for canonical correlations leads to inflated error rates.
In the absence of nuisance variables, however, a simple permutation test for CCA also leads to excess error rates for all canonical correlations other than the first.
Here we show that transforming the residuals to a lower dimensional basis where exchangeability holds results in a valid permutation test.
- Score: 0.7646713951724012
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Canonical correlation analysis (CCA) has become a key tool for population
neuroimaging, allowing investigation of associations between many imaging and
non-imaging measurements. As other variables are often a source of variability
not of direct interest, previous work has used CCA on residuals from a model
that removes these effects, then proceeded directly to permutation inference.
We show that such a simple permutation test leads to inflated error rates. The
reason is that residualisation introduces dependencies among the observations
that violate the exchangeability assumption. Even in the absence of nuisance
variables, however, a simple permutation test for CCA also leads to excess
error rates for all canonical correlations other than the first. The reason is
that a simple permutation scheme does not ignore the variability already
explained by previous canonical variables. Here we propose solutions for both
problems: in the case of nuisance variables, we show that transforming the
residuals to a lower dimensional basis where exchangeability holds results in a
valid permutation test; for more general cases, with or without nuisance
variables, we propose estimating the canonical correlations in a stepwise
manner, removing at each iteration the variance already explained, while
dealing with different number of variables in both sides. We also discuss how
to address the multiplicity of tests, proposing an admissible test that is not
conservative, and provide a complete algorithm for permutation inference for
CCA.
Related papers
- Permutation invariant functions: statistical tests, density estimation, and computationally efficient embedding [1.4316259003164373]
Permutation invariance is among the most common symmetry that can be exploited to simplify complex problems in machine learning (ML)
In this paper, we take a step back and examine these questions in several fundamental problems.
Our methods for (i) and (iv) are based on a sorting trick and (ii) is based on an averaging trick.
arXiv Detail & Related papers (2024-03-04T01:49:23Z) - Equivariant Disentangled Transformation for Domain Generalization under
Combination Shift [91.38796390449504]
Combinations of domains and labels are not observed during training but appear in the test environment.
We provide a unique formulation of the combination shift problem based on the concepts of homomorphism, equivariance, and a refined definition of disentanglement.
arXiv Detail & Related papers (2022-08-03T12:31:31Z) - On the Strong Correlation Between Model Invariance and Generalization [54.812786542023325]
Generalization captures a model's ability to classify unseen data.
Invariance measures consistency of model predictions on transformations of the data.
From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets.
arXiv Detail & Related papers (2022-07-14T17:08:25Z) - Sequential Permutation Testing of Random Forest Variable Importance
Measures [68.8204255655161]
It is proposed here to use sequential permutation tests and sequential p-value estimation to reduce the high computational costs associated with conventional permutation tests.
The results of simulation studies confirm that the theoretical properties of the sequential tests apply.
The numerical stability of the methods is investigated in two additional application studies.
arXiv Detail & Related papers (2022-06-02T20:16:50Z) - E-detectors: a nonparametric framework for sequential change detection [86.15115654324488]
We develop a fundamentally new and general framework for sequential change detection.
Our procedures come with clean, nonasymptotic bounds on the average run length.
We show how to design their mixtures in order to achieve both statistical and computational efficiency.
arXiv Detail & Related papers (2022-03-07T17:25:02Z) - Variance Minimization in the Wasserstein Space for Invariant Causal
Prediction [72.13445677280792]
In this work, we show that the approach taken in ICP may be reformulated as a series of nonparametric tests that scales linearly in the number of predictors.
Each of these tests relies on the minimization of a novel loss function that is derived from tools in optimal transport theory.
We prove under mild assumptions that our method is able to recover the set of identifiable direct causes, and we demonstrate in our experiments that it is competitive with other benchmark causal discovery algorithms.
arXiv Detail & Related papers (2021-10-13T22:30:47Z) - $\ell_0$-based Sparse Canonical Correlation Analysis [7.073210405344709]
Canonical Correlation Analysis (CCA) models are powerful for studying the associations between two sets of variables.
Despite their success, CCA models may break if the number of variables in either of the modalities exceeds the number of samples.
Here, we propose $ell_0$-CCA, a method for learning correlated representations based on sparse subsets of two observed modalities.
arXiv Detail & Related papers (2020-10-12T11:44:15Z) - The leave-one-covariate-out conditional randomization test [36.9351790405311]
Conditional independence testing is an important problem, yet provably hard without assumptions.
Knockoffs is a popular methodology associated with this framework, but it suffers from two main drawbacks.
The conditional randomization test (CRT) is thought to be the "right" solution under model-X, but usually viewed as computationally inefficient.
arXiv Detail & Related papers (2020-06-15T15:38:24Z) - Stable Prediction via Leveraging Seed Variable [73.9770220107874]
Previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction.
We propose a conditional independence test based algorithm to separate causal variables with a seed variable as priori, and adopt them for stable prediction.
Our algorithm outperforms state-of-the-art methods for stable prediction.
arXiv Detail & Related papers (2020-06-09T06:56:31Z) - Sparse Generalized Canonical Correlation Analysis: Distributed
Alternating Iteration based Approach [18.93565942407577]
Sparse canonical correlation analysis (CCA) is a useful statistical tool to detect latent information with sparse structures.
We propose a generalized canonical correlation analysis (GCCA), which could detect the latent relations of multiview data with sparse structures.
arXiv Detail & Related papers (2020-04-23T05:53:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.