Spectral decomposition-assisted multi-study factor analysis
- URL: http://arxiv.org/abs/2502.14600v1
- Date: Thu, 20 Feb 2025 14:33:40 GMT
- Title: Spectral decomposition-assisted multi-study factor analysis
- Authors: Lorenzo Mauri, Niccolò Anceschi, David B. Dunson,
- Abstract summary: Methods are applied to integrate three studies on gene associations among immune cells.
Conditional distribution of factor loadings has a simple product form across outcomes.
- Score: 7.925272817108244
- License:
- Abstract: This article focuses on covariance estimation for multi-study data. Popular approaches employ factor-analytic terms with shared and study-specific loadings that decompose the variance into (i) a shared low-rank component, (ii) study-specific low-rank components, and (iii) a diagonal term capturing idiosyncratic variability. Our proposed methodology estimates the latent factors via spectral decompositions and infers the factor loadings via surrogate regression tasks, avoiding identifiability and computational issues of existing alternatives. Reliably inferring shared vs study-specific components requires novel developments that are of independent interest. The approximation error decreases as the sample size and the data dimension diverge, formalizing a blessing of dimensionality. Conditionally on the factors, loadings and residual error variances are inferred via conjugate normal-inverse gamma priors. The conditional posterior distribution of factor loadings has a simple product form across outcomes, facilitating parallelization. We show favorable asymptotic properties, including central limit theorems for point estimators and posterior contraction, and excellent empirical performance in simulations. The methods are applied to integrate three studies on gene associations among immune cells.
Related papers
- Linear causal disentanglement via higher-order cumulants [0.0]
We study the identifiability of linear causal disentanglement, assuming access to data under multiple contexts.
We show that one perfect intervention on each latent variable is sufficient and in the worst case necessary to recover parameters under perfect interventions.
arXiv Detail & Related papers (2024-07-05T15:53:16Z) - Causal Inference for Genomic Data with Multiple Heterogeneous Outcomes [1.5845117761091052]
We propose a generic semiparametric framework for doubly robust estimation with multiple derived outcomes.
We specialize the analysis to standardized average treatment effects and quantile treatment effects.
arXiv Detail & Related papers (2024-04-14T02:07:14Z) - Trade-off Between Dependence and Complexity for Nonparametric Learning
-- an Empirical Process Approach [10.27974860479791]
In many applications where the data exhibit temporal dependencies, the corresponding empirical processes are much less understood.
We present a general bound on the expected supremum of empirical processes under standard $beta/rho$-mixing assumptions.
We show that even under long-range dependence, it is possible to attain the same rates as in the i.i.d. setting.
arXiv Detail & Related papers (2024-01-17T05:08:37Z) - Nonparametric Partial Disentanglement via Mechanism Sparsity: Sparse
Actions, Interventions and Sparse Temporal Dependencies [58.179981892921056]
This work introduces a novel principle for disentanglement we call mechanism sparsity regularization.
We propose a representation learning method that induces disentanglement by simultaneously learning the latent factors.
We show that the latent factors can be recovered by regularizing the learned causal graph to be sparse.
arXiv Detail & Related papers (2024-01-10T02:38:21Z) - Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Simultaneous inference for generalized linear models with unmeasured confounders [0.0]
We propose a unified statistical estimation and inference framework that harnesses structures and integrates linear projections into three key stages.
We show effective Type-I error control of $z$-tests as sample and response sizes approach infinity.
arXiv Detail & Related papers (2023-09-13T18:53:11Z) - Disentanglement of Correlated Factors via Hausdorff Factorized Support [53.23740352226391]
We propose a relaxed disentanglement criterion - the Hausdorff Factorized Support (HFS) criterion - that encourages a factorized support, rather than a factorial distribution.
We show that the use of HFS consistently facilitates disentanglement and recovery of ground-truth factors across a variety of correlation settings and benchmarks.
arXiv Detail & Related papers (2022-10-13T20:46:42Z) - Data-Driven Influence Functions for Optimization-Based Causal Inference [105.5385525290466]
We study a constructive algorithm that approximates Gateaux derivatives for statistical functionals by finite differencing.
We study the case where probability distributions are not known a priori but need to be estimated from data.
arXiv Detail & Related papers (2022-08-29T16:16:22Z) - Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics
for Convex Losses in High-Dimension [25.711297863946193]
We develop a theory for the study of fluctuations in an ensemble of generalised linear models trained on different, but correlated, features.
We provide a complete description of the joint distribution of the empirical risk minimiser for generic convex loss and regularisation in the high-dimensional limit.
arXiv Detail & Related papers (2022-01-31T17:44:58Z) - Variance Minimization in the Wasserstein Space for Invariant Causal
Prediction [72.13445677280792]
In this work, we show that the approach taken in ICP may be reformulated as a series of nonparametric tests that scales linearly in the number of predictors.
Each of these tests relies on the minimization of a novel loss function that is derived from tools in optimal transport theory.
We prove under mild assumptions that our method is able to recover the set of identifiable direct causes, and we demonstrate in our experiments that it is competitive with other benchmark causal discovery algorithms.
arXiv Detail & Related papers (2021-10-13T22:30:47Z) - Latent Causal Invariant Model [128.7508609492542]
Current supervised learning can learn spurious correlation during the data-fitting process.
We propose a Latent Causal Invariance Model (LaCIM) which pursues causal prediction.
arXiv Detail & Related papers (2020-11-04T10:00:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.