Gaussian Latent Dirichlet Allocation for Discrete Human State Discovery
- URL: http://arxiv.org/abs/2206.14233v1
- Date: Tue, 28 Jun 2022 18:33:46 GMT
- Title: Gaussian Latent Dirichlet Allocation for Discrete Human State Discovery
- Authors: Congyu Wu, Aaron Fisher, David Schnyer
- Abstract summary: We propose and validate an unsupervised probabilistic model, Gaussian Latent Dirichlet Allocation (GLDA), for the problem of discrete state discovery.
GLDA borrows the individual-specific mixture structure from a popular topic model Latent Dirichlet Allocation (LDA) in Natural Language Processing.
We found that in both datasets the GLDA-learned class weights achieved significantly higher correlations with clinically assessed depression, anxiety, and stress scores than those produced by the baseline GMM.
- Score: 1.057079240576682
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this article we propose and validate an unsupervised probabilistic model,
Gaussian Latent Dirichlet Allocation (GLDA), for the problem of discrete state
discovery from repeated, multivariate psychophysiological samples collected
from multiple, inherently distinct, individuals. Psychology and medical
research heavily involves measuring potentially related but individually
inconclusive variables from a cohort of participants to derive diagnosis,
necessitating clustering analysis. Traditional probabilistic clustering models
such as Gaussian Mixture Model (GMM) assume a global mixture of component
distributions, which may not be realistic for observations from different
patients. The GLDA model borrows the individual-specific mixture structure from
a popular topic model Latent Dirichlet Allocation (LDA) in Natural Language
Processing and merges it with the Gaussian component distributions of GMM to
suit continuous type data. We implemented GLDA using STAN (a probabilistic
modeling language) and applied it on two datasets, one containing Ecological
Momentary Assessments (EMA) and the other heart measures from electrocardiogram
and impedance cardiograph. We found that in both datasets the GLDA-learned
class weights achieved significantly higher correlations with clinically
assessed depression, anxiety, and stress scores than those produced by the
baseline GMM. Our findings demonstrate the advantage of GLDA over conventional
finite mixture models for human state discovery from repeated multivariate
data, likely due to better characterization of potential underlying
between-participant differences. Future work is required to validate the
utility of this model on a broader range of applications.
Related papers
- Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold [83.18058549195855]
We argue that multiple processes in natural sciences have to be represented as vector fields on the Wasserstein manifold of probability densities.
In particular, this is crucial for personalized medicine where the development of diseases and their respective treatment response depends on the microenvironment of cells specific to each patient.
We propose Meta Flow Matching (MFM), a practical approach to integrating along these vector fields on the Wasserstein manifold by amortizing the flow model over the initial populations.
arXiv Detail & Related papers (2024-08-26T20:05:31Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Toward the Identifiability of Comparative Deep Generative Models [7.5479347719819865]
We propose a theory of identifiability for comparative Deep Generative Models (DGMs)
We show that, while these models lack identifiability across a general class of mixing functions, they surprisingly become identifiable when the mixing function is piece-wise affine.
We also investigate the impact of model misspecification, and empirically show that previously proposed regularization techniques for fitting comparative DGMs help with identifiability when the number of latent variables is not known in advance.
arXiv Detail & Related papers (2024-01-29T06:10:54Z) - Probabilistic Classification by Density Estimation Using Gaussian
Mixture Model and Masked Autoregressive Flow [1.3706331473063882]
Density estimation, which estimates the distribution of data, is an important category of probabilistic machine learning.
In this paper, we use the density estimators for classification, although they are often used for estimating the distribution of data.
We model the likelihood of classes of data by density estimation, specifically using GMM and MAF.
arXiv Detail & Related papers (2023-10-16T21:37:22Z) - Ambiguous Medical Image Segmentation using Diffusion Models [60.378180265885945]
We introduce a single diffusion model-based approach that produces multiple plausible outputs by learning a distribution over group insights.
Our proposed model generates a distribution of segmentation masks by leveraging the inherent sampling process of diffusion.
Comprehensive results show that our proposed approach outperforms existing state-of-the-art ambiguous segmentation networks.
arXiv Detail & Related papers (2023-04-10T17:58:22Z) - Unsupervised Probabilistic Models for Sequential Electronic Health
Records [3.8015092217142223]
The model consists of a layered set of latent variables that encode underlying structure in the data.
We train this model on episodic data from subjects receiving medical care in the Kaiser Permanente Northern California integrated healthcare delivery system.
The resulting properties of the trained model generate novel insight from these complex and multifaceted data.
arXiv Detail & Related papers (2022-04-15T02:11:44Z) - MoReL: Multi-omics Relational Learning [26.484803417186384]
We propose a novel deep Bayesian generative model to efficiently infer a multi-partite graph encoding molecular interactions across heterogeneous views.
With such an optimal transport regularization in the deep Bayesian generative model, it not only allows incorporating view-specific side information, but also increases the model flexibility with the distribution-based regularization.
arXiv Detail & Related papers (2022-03-15T02:50:07Z) - Multi-modality fusion using canonical correlation analysis methods:
Application in breast cancer survival prediction from histology and genomics [16.537929113715432]
We study the use of canonical correlation analysis (CCA) and penalized variants of CCA for the fusion of two modalities.
We analytically show that, with known model parameters, posterior mean estimators that jointly use both modalities outperform arbitrary linear mixing of single modality posterior estimators in latent variable prediction.
arXiv Detail & Related papers (2021-11-27T21:18:01Z) - Continual Learning with Fully Probabilistic Models [70.3497683558609]
We present an approach for continual learning based on fully probabilistic (or generative) models of machine learning.
We propose a pseudo-rehearsal approach using a Gaussian Mixture Model (GMM) instance for both generator and classifier functionalities.
We show that GMR achieves state-of-the-art performance on common class-incremental learning problems at very competitive time and memory complexity.
arXiv Detail & Related papers (2021-04-19T12:26:26Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.