Mixture of Conditional Gaussian Graphical Models for unlabelled
heterogeneous populations in the presence of co-factors
- URL: http://arxiv.org/abs/2006.11094v4
- Date: Tue, 8 Mar 2022 10:58:55 GMT
- Title: Mixture of Conditional Gaussian Graphical Models for unlabelled
heterogeneous populations in the presence of co-factors
- Authors: Thomas Lartigue (ARAMIS, CMAP), Stanley Durrleman (ARAMIS),
St\'ephanie Allassonni\`ere (CRC (UMR\_S\_1138 / U1138))
- Abstract summary: Conditional correlation networks, within Gaussian Graphical Models (GGM), are widely used to describe the direct interactions between the components of a random vector.
In this article, we propose a Mixture of Conditional GGM (CGGM) that subtracts the heterogeneous effects of the co-features to regroup the data points into sub-population corresponding clusters.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conditional correlation networks, within Gaussian Graphical Models (GGM), are
widely used to describe the direct interactions between the components of a
random vector. In the case of an unlabelled Heterogeneous population,
Expectation Maximisation (EM) algorithms for Mixtures of GGM have been proposed
to estimate both each sub-population's graph and the class labels. However, we
argue that, with most real data, class affiliation cannot be described with a
Mixture of Gaussian, which mostly groups data points according to their
geometrical proximity. In particular, there often exists external co-features
whose values affect the features' average value, scattering across the feature
space data points belonging to the same sub-population. Additionally, if the
co-features' effect on the features is Heterogeneous, then the estimation of
this effect cannot be separated from the sub-population identification. In this
article, we propose a Mixture of Conditional GGM (CGGM) that subtracts the
heterogeneous effects of the co-features to regroup the data points into
sub-population corresponding clusters. We develop a penalised EM algorithm to
estimate graph-sparse model parameters. We demonstrate on synthetic and real
data how this method fulfils its goal and succeeds in identifying the
sub-populations where the Mixtures of GGM are disrupted by the effect of the
co-features.
Related papers
- The Breakdown of Gaussian Universality in Classification of High-dimensional Mixtures [6.863637695977277]
We provide a high-dimensional characterization of empirical risk minimization for classification under a general mixture data setting.
We specify conditions for Gaussian universality and discuss their implications for the choice of loss function.
arXiv Detail & Related papers (2024-10-08T01:45:37Z) - Generation is better than Modification: Combating High Class Homophily Variance in Graph Anomaly Detection [51.11833609431406]
Homophily distribution differences between different classes are significantly greater than those in homophilic and heterophilic graphs.
We introduce a new metric called Class Homophily Variance, which quantitatively describes this phenomenon.
To mitigate its impact, we propose a novel GNN model named Homophily Edge Generation Graph Neural Network (HedGe)
arXiv Detail & Related papers (2024-03-15T14:26:53Z) - Combining propensity score methods with variational autoencoders for
generating synthetic data in presence of latent sub-groups [0.0]
Heterogeneity might be known, e.g., as indicated by sub-groups labels, or might be unknown and reflected only in properties of distributions, such as bimodality or skewness.
We investigate how such heterogeneity can be preserved and controlled when obtaining synthetic data from variational autoencoders (VAEs), i.e., a generative deep learning technique.
arXiv Detail & Related papers (2023-12-12T22:49:24Z) - Graph Fourier MMD for Signals on Graphs [67.68356461123219]
We propose a novel distance between distributions and signals on graphs.
GFMMD is defined via an optimal witness function that is both smooth on the graph and maximizes difference in expectation.
We showcase it on graph benchmark datasets as well as on single cell RNA-sequencing data analysis.
arXiv Detail & Related papers (2023-06-05T00:01:17Z) - GMMSeg: Gaussian Mixture based Generative Semantic Segmentation Models [74.0430727476634]
We propose a new family of segmentation models that rely on a dense generative classifier for the joint distribution p(pixel feature,class)
With a variety of segmentation architectures and backbones, GMMSeg outperforms the discriminative counterparts on closed-set datasets.
GMMSeg even performs well on open-world datasets.
arXiv Detail & Related papers (2022-10-05T05:20:49Z) - Gaussian Latent Dirichlet Allocation for Discrete Human State Discovery [1.057079240576682]
We propose and validate an unsupervised probabilistic model, Gaussian Latent Dirichlet Allocation (GLDA), for the problem of discrete state discovery.
GLDA borrows the individual-specific mixture structure from a popular topic model Latent Dirichlet Allocation (LDA) in Natural Language Processing.
We found that in both datasets the GLDA-learned class weights achieved significantly higher correlations with clinically assessed depression, anxiety, and stress scores than those produced by the baseline GMM.
arXiv Detail & Related papers (2022-06-28T18:33:46Z) - A Robust and Flexible EM Algorithm for Mixtures of Elliptical
Distributions with Missing Data [71.9573352891936]
This paper tackles the problem of missing data imputation for noisy and non-Gaussian data.
A new EM algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data.
Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data.
arXiv Detail & Related papers (2022-01-28T10:01:37Z) - Mycorrhiza: Genotype Assignment usingPhylogenetic Networks [2.286041284499166]
We introduce Mycorrhiza, a machine learning approach for the genotype assignment problem.
Our algorithm makes use of phylogenetic networks to engineer features that encode the evolutionary relationships among samples.
Mycorrhiza yields particularly significant gains on datasets with a large average fixation index (FST) or deviation from the Hardy-Weinberg equilibrium.
arXiv Detail & Related papers (2020-10-14T02:36:27Z) - A Rigorous Link Between Self-Organizing Maps and Gaussian Mixture Models [78.6363825307044]
This work presents a mathematical treatment of the relation between Self-Organizing Maps (SOMs) and Gaussian Mixture Models (GMMs)
We show that energy-based SOM models can be interpreted as performing gradient descent.
This link allows to treat SOMs as generative probabilistic models, giving a formal justification for using SOMs to detect outliers, or for sampling.
arXiv Detail & Related papers (2020-09-24T14:09:04Z) - Block-Approximated Exponential Random Graphs [77.4792558024487]
An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs.
We propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions.
Our methods are scalable to sparse graphs consisting of millions of nodes.
arXiv Detail & Related papers (2020-02-14T11:42:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.