Conditional canonical correlation estimation based on covariates with
random forests
- URL: http://arxiv.org/abs/2011.11555v2
- Date: Wed, 3 Feb 2021 22:55:03 GMT
- Title: Conditional canonical correlation estimation based on covariates with
random forests
- Authors: Cansu Alakus, Denis Larocque, Sebastien Jacquemont, Fanny Barlaam,
Charles-Olivier Martin, Kristian Agbogba, Sarah Lippe, Aurelie Labbe
- Abstract summary: We propose a new method called Random Forest with Canonical Correlation Analysis (RFCCA) to estimate the conditional canonical correlations between two sets of variables.
The proposed method and the global significance test is evaluated through simulation studies that show it provides accurate canonical correlation estimations and well-controlled Type-1 error.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Investigating the relationships between two sets of variables helps to
understand their interactions and can be done with canonical correlation
analysis (CCA). However, the correlation between the two sets can sometimes
depend on a third set of covariates, often subject-related ones such as age,
gender, or other clinical measures. In this case, applying CCA to the whole
population is not optimal and methods to estimate conditional CCA, given the
covariates, can be useful. We propose a new method called Random Forest with
Canonical Correlation Analysis (RFCCA) to estimate the conditional canonical
correlations between two sets of variables given subject-related covariates.
The individual trees in the forest are built with a splitting rule specifically
designed to partition the data to maximize the canonical correlation
heterogeneity between child nodes. We also propose a significance test to
detect the global effect of the covariates on the relationship between two sets
of variables. The performance of the proposed method and the global
significance test is evaluated through simulation studies that show it provides
accurate canonical correlation estimations and well-controlled Type-1 error. We
also show an application of the proposed method with EEG data.
Related papers
- Flow-Based Density Ratio Estimation for Intractable Distributions with Applications in Genomics [80.05951561886123]
We leverage condition-aware flow matching to derive a single dynamical formulation for tracking density ratios along generative trajectories.<n>We demonstrate competitive performance on simulated benchmarks for closed-form ratio estimation, and show that our method supports versatile tasks in single-cell genomics data analysis.
arXiv Detail & Related papers (2026-02-27T17:27:55Z) - Geographically Weighted Canonical Correlation Analysis: Local Spatial Associations Between Two Sets of Variables [47.652697094546994]
This article critically assesses the utility of the classical statistical technique of Canonical Correlation Analysis (CCA) for studying spatial associations.<n>We propose Geographically Weighted Canonical Correlation Analysis (GWCCA) as a new technique for exploring local spatial associations between two sets of variables.<n>The results indicate that GWCCA has broad potential applications in spatial data-intensive fields such as urban planning, environmental science, public health, and transportation.
arXiv Detail & Related papers (2026-02-10T19:36:49Z) - Efficient Covariance Estimation for Sparsified Functional Data [51.69796254617083]
proposed Random-knots (Random-knots-Spatial) and B-spline (Bspline-Spatial) estimators of the covariance function are computationally efficient.<n>Asymptotic pointwise of the covariance are obtained for sparsified individual trajectories under some regularity conditions.
arXiv Detail & Related papers (2025-11-23T00:50:33Z) - Local Learning for Covariate Selection in Nonparametric Causal Effect Estimation with Latent Variables [13.12743473333296]
Estimating causal effects from nonexperimental data is a fundamental problem in many fields of science.
We propose a novel local learning approach for covariate selection in nonparametric causal effect estimation.
We validate our algorithm through extensive experiments on both synthetic and real-world data.
arXiv Detail & Related papers (2024-11-25T12:08:54Z) - Semiparametric conformal prediction [79.6147286161434]
Risk-sensitive applications require well-calibrated prediction sets over multiple, potentially correlated target variables.
We treat the scores as random vectors and aim to construct the prediction set accounting for their joint correlation structure.
We report desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Covariance regression with random forests [0.0]
CovRegRF is implemented in a freely available R package on CRAN.
An application of the proposed method to thyroid disease data is also presented.
arXiv Detail & Related papers (2022-09-16T21:21:18Z) - Multi-modality fusion using canonical correlation analysis methods:
Application in breast cancer survival prediction from histology and genomics [16.537929113715432]
We study the use of canonical correlation analysis (CCA) and penalized variants of CCA for the fusion of two modalities.
We analytically show that, with known model parameters, posterior mean estimators that jointly use both modalities outperform arbitrary linear mixing of single modality posterior estimators in latent variable prediction.
arXiv Detail & Related papers (2021-11-27T21:18:01Z) - Scalable Intervention Target Estimation in Linear Models [52.60799340056917]
Current approaches to causal structure learning either work with known intervention targets or use hypothesis testing to discover the unknown intervention targets.
This paper proposes a scalable and efficient algorithm that consistently identifies all intervention targets.
The proposed algorithm can be used to also update a given observational Markov equivalence class into the interventional Markov equivalence class.
arXiv Detail & Related papers (2021-11-15T03:16:56Z) - A Statistical Analysis of Summarization Evaluation Metrics using
Resampling Methods [60.04142561088524]
We find that the confidence intervals are rather wide, demonstrating high uncertainty in how reliable automatic metrics truly are.
Although many metrics fail to show statistical improvements over ROUGE, two recent works, QAEval and BERTScore, do in some evaluation settings.
arXiv Detail & Related papers (2021-03-31T18:28:14Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Grouping effects of sparse CCA models in variable selection [6.196334136139173]
We analyze the grouping effect of the standard and simplified SCCA models in variable selection.
Our theoretical analysis shows that for grouped variable selection, the simplified SCCA jointly selects or deselects a group of variables together.
arXiv Detail & Related papers (2020-08-07T22:27:31Z) - Probabilistic Canonical Correlation Analysis for Sparse Count Data [3.1753001245931323]
Canonical correlation analysis is an important technique for exploring the relationship between two sets of continuous variables.
We propose a model-based probabilistic approach for correlation and canonical correlation estimation for two sparse count data sets.
arXiv Detail & Related papers (2020-05-11T02:19:57Z) - Learning from Aggregate Observations [82.44304647051243]
We study the problem of learning from aggregate observations where supervision signals are given to sets of instances.
We present a general probabilistic framework that accommodates a variety of aggregate observations.
Simple maximum likelihood solutions can be applied to various differentiable models.
arXiv Detail & Related papers (2020-04-14T06:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.