Nonlinear multi-study factor analysis
- URL: http://arxiv.org/abs/2601.18128v1
- Date: Mon, 26 Jan 2026 04:16:47 GMT
- Title: Nonlinear multi-study factor analysis
- Authors: Gemma E. Moran, Anandi Krishnan,
- Abstract summary: We consider platelet gene expression data from patients in different disease groups.<n>We consider a nonlinear multi-study factor model, which allows for both shared and specific factors.<n>We prove that the latent factors are identified, and demonstrate our method recovers meaningful factors in the platelet gene expression data.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-dimensional data often exhibit variation that can be captured by lower dimensional factors. For high-dimensional data from multiple studies or environments, one goal is to understand which underlying factors are common to all studies, and which factors are study or environment-specific. As a particular example, we consider platelet gene expression data from patients in different disease groups. In this data, factors correspond to clusters of genes which are co-expressed; we may expect some clusters (or biological pathways) to be active for all diseases, while some clusters are only active for a specific disease. To learn these factors, we consider a nonlinear multi-study factor model, which allows for both shared and specific factors. To fit this model, we propose a multi-study sparse variational autoencoder. The underlying model is sparse in that each observed feature (i.e. each dimension of the data) depends on a small subset of the latent factors. In the genomics example, this means each gene is active in only a few biological processes. Further, the model implicitly induces a penalty on the number of latent factors, which helps separate the shared factors from the group-specific factors. We prove that the latent factors are identified, and demonstrate our method recovers meaningful factors in the platelet gene expression data.
Related papers
- Causal Learning for Heterogeneous Subgroups Based on Nonlinear Causal Kernel Clustering [11.9672224014053]
The nonlinear Causal Kernel Clustering method is introduced for heterogeneous causal learning.<n> Experimental results indicate that the method performs well in identifying heterogeneous subgroups and enhancing causal learning.
arXiv Detail & Related papers (2025-01-20T17:43:17Z) - Identifying latent disease factors differently expressed in patient subgroups using group factor analysis [54.67330718129736]
We propose a novel approach to uncover subgroup-specific and subgroup-common latent factors.
The proposed approach, sparse Group Factor Analysis (GFA) with regularised horseshoe priors, was implemented with probabilistic programming.
arXiv Detail & Related papers (2024-10-10T13:12:14Z) - A Causal Framework for Decomposing Spurious Variations [68.12191782657437]
We develop tools for decomposing spurious variations in Markovian and Semi-Markovian models.
We prove the first results that allow a non-parametric decomposition of spurious effects.
The described approach has several applications, ranging from explainable and fair AI to questions in epidemiology and medicine.
arXiv Detail & Related papers (2023-06-08T09:40:28Z) - Interventional Causal Representation Learning [75.18055152115586]
Causal representation learning seeks to extract high-level latent factors from low-level sensory data.
Can interventional data facilitate causal representation learning?
We show that interventional data often carries geometric signatures of the latent factors' support.
arXiv Detail & Related papers (2022-09-24T04:59:03Z) - Feature diversity in self-supervised learning [0.0]
We investigate how these factors may affect overall generalization performance in the context of self-supervised learning with CNN models.
We found that the last layer is the most diversified throughout the training.
While the model's test error decreases with increasing epochs, its diversity drops.
arXiv Detail & Related papers (2022-09-02T21:34:11Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Causal Discovery in Linear Structural Causal Models with Deterministic
Relations [27.06618125828978]
We focus on the task of causal discovery form observational data.
We derive a set of necessary and sufficient conditions for unique identifiability of the causal structure.
arXiv Detail & Related papers (2021-10-30T21:32:42Z) - Relational Subsets Knowledge Distillation for Long-tailed Retinal
Diseases Recognition [65.77962788209103]
We propose class subset learning by dividing the long-tailed data into multiple class subsets according to prior knowledge.
It enforces the model to focus on learning the subset-specific knowledge.
The proposed framework proved to be effective for the long-tailed retinal diseases recognition task.
arXiv Detail & Related papers (2021-04-22T13:39:33Z) - CausalVAE: Structured Causal Disentanglement in Variational Autoencoder [52.139696854386976]
The framework of variational autoencoder (VAE) is commonly used to disentangle independent factors from observations.
We propose a new VAE based framework named CausalVAE, which includes a Causal Layer to transform independent factors into causal endogenous ones.
Results show that the causal representations learned by CausalVAE are semantically interpretable, and their causal relationship as a Directed Acyclic Graph (DAG) is identified with good accuracy.
arXiv Detail & Related papers (2020-04-18T20:09:34Z) - Pursuing Sources of Heterogeneity in Modeling Clustered Population [16.936362485508774]
We propose a regularized finite mixture effects regression to achieve heterogeneous pursuit and feature selection simultaneously.
A constrained sparse estimation of these effects leads to the identification of both the variables with common effects and those with heterogeneous effects.
Three applications are presented, namely, an imaging genetics study for linking genetic factors and brain traits in Alzheimer's disease, a public health study for exploring the association between suicide risk among adolescents and their school district characteristics, and a sport analytics study for understanding how the salary levels of baseball players are associated with their performance and contractual status.
arXiv Detail & Related papers (2020-03-10T14:59:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.