Grouping effects of sparse CCA models in variable selection
- URL: http://arxiv.org/abs/2008.03392v1
- Date: Fri, 7 Aug 2020 22:27:31 GMT
- Title: Grouping effects of sparse CCA models in variable selection
- Authors: Kefei Liu, Qi Long, Li Shen
- Abstract summary: We analyze the grouping effect of the standard and simplified SCCA models in variable selection.
Our theoretical analysis shows that for grouped variable selection, the simplified SCCA jointly selects or deselects a group of variables together.
- Score: 6.196334136139173
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The sparse canonical correlation analysis (SCCA) is a bi-multivariate
association model that finds sparse linear combinations of two sets of
variables that are maximally correlated with each other. In addition to the
standard SCCA model, a simplified SCCA criterion which maixmizes the
cross-covariance between a pair of canonical variables instead of their
cross-correlation, is widely used in the literature due to its computational
simplicity. However, the behaviors/properties of the solutions of these two
models remain unknown in theory. In this paper, we analyze the grouping effect
of the standard and simplified SCCA models in variable selection. In
high-dimensional settings, the variables often form groups with high
within-group correlation and low between-group correlation. Our theoretical
analysis shows that for grouped variable selection, the simplified SCCA jointly
selects or deselects a group of variables together, while the standard SCCA
randomly selects a few dominant variables from each relevant group of
correlated variables. Empirical results on synthetic data and real imaging
genetics data verify the finding of our theoretical analysis.
Related papers
- HiPerformer: Hierarchically Permutation-Equivariant Transformer for Time
Series Forecasting [56.95572957863576]
We propose a hierarchically permutation-equivariant model that considers both the relationship among components in the same group and the relationship among groups.
The experiments conducted on real-world data demonstrate that the proposed method outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2023-05-14T05:11:52Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Dynamically-Scaled Deep Canonical Correlation Analysis [77.34726150561087]
Canonical Correlation Analysis (CCA) is a method for feature extraction of two views by finding maximally correlated linear projections of them.
We introduce a novel dynamic scaling method for training an input-dependent canonical correlation model.
arXiv Detail & Related papers (2022-03-23T12:52:49Z) - Conditional canonical correlation estimation based on covariates with
random forests [0.0]
We propose a new method called Random Forest with Canonical Correlation Analysis (RFCCA) to estimate the conditional canonical correlations between two sets of variables.
The proposed method and the global significance test is evaluated through simulation studies that show it provides accurate canonical correlation estimations and well-controlled Type-1 error.
arXiv Detail & Related papers (2020-11-23T17:09:46Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - $\ell_0$-based Sparse Canonical Correlation Analysis [7.073210405344709]
Canonical Correlation Analysis (CCA) models are powerful for studying the associations between two sets of variables.
Despite their success, CCA models may break if the number of variables in either of the modalities exceeds the number of samples.
Here, we propose $ell_0$-CCA, a method for learning correlated representations based on sparse subsets of two observed modalities.
arXiv Detail & Related papers (2020-10-12T11:44:15Z) - High-Dimensional Quadratic Discriminant Analysis under Spiked Covariance
Model [101.74172837046382]
We propose a novel quadratic classification technique, the parameters of which are chosen such that the fisher-discriminant ratio is maximized.
Numerical simulations show that the proposed classifier not only outperforms the classical R-QDA for both synthetic and real data but also requires lower computational complexity.
arXiv Detail & Related papers (2020-06-25T12:00:26Z) - Probabilistic Canonical Correlation Analysis for Sparse Count Data [3.1753001245931323]
Canonical correlation analysis is an important technique for exploring the relationship between two sets of continuous variables.
We propose a model-based probabilistic approach for correlation and canonical correlation estimation for two sparse count data sets.
arXiv Detail & Related papers (2020-05-11T02:19:57Z) - Sparse Generalized Canonical Correlation Analysis: Distributed
Alternating Iteration based Approach [18.93565942407577]
Sparse canonical correlation analysis (CCA) is a useful statistical tool to detect latent information with sparse structures.
We propose a generalized canonical correlation analysis (GCCA), which could detect the latent relations of multiview data with sparse structures.
arXiv Detail & Related papers (2020-04-23T05:53:48Z) - Repulsive Mixture Models of Exponential Family PCA for Clustering [127.90219303669006]
The mixture extension of exponential family principal component analysis ( EPCA) was designed to encode much more structural information about data distribution than the traditional EPCA.
The traditional mixture of local EPCAs has the problem of model redundancy, i.e., overlaps among mixing components, which may cause ambiguity for data clustering.
In this paper, a repulsiveness-encouraging prior is introduced among mixing components and a diversified EPCA mixture (DEPCAM) model is developed in the Bayesian framework.
arXiv Detail & Related papers (2020-04-07T04:07:29Z) - Blocked Clusterwise Regression [0.0]
We generalize previous approaches to discrete unobserved heterogeneity by allowing each unit to have multiple latent variables.
We contribute to the theory of clustering with an over-specified number of clusters and derive new convergence rates for this setting.
arXiv Detail & Related papers (2020-01-29T23:29:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.