Encoding Domain Information with Sparse Priors for Inferring Explainable
Latent Variables
- URL: http://arxiv.org/abs/2107.03730v1
- Date: Thu, 8 Jul 2021 10:19:32 GMT
- Title: Encoding Domain Information with Sparse Priors for Inferring Explainable
Latent Variables
- Authors: Arber Qoku and Florian Buettner
- Abstract summary: We propose spex-LVM, a factorial latent variable model with sparse priors to encourage the inference of explainable factors.
spex-LVM utilizes existing knowledge of curated biomedical pathways to automatically assign annotated attributes to latent factors.
Evaluations on simulated and real single-cell RNA-seq datasets demonstrate that our model robustly identifies relevant structure in an inherently explainable manner.
- Score: 2.8935588665357077
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Latent variable models are powerful statistical tools that can uncover
relevant variation between patients or cells, by inferring unobserved hidden
states from observable high-dimensional data. A major shortcoming of current
methods, however, is their inability to learn sparse and interpretable hidden
states. Additionally, in settings where partial knowledge on the latent
structure of the data is readily available, a statistically sound integration
of prior information into current methods is challenging. To address these
issues, we propose spex-LVM, a factorial latent variable model with sparse
priors to encourage the inference of explainable factors driven by
domain-relevant information. spex-LVM utilizes existing knowledge of curated
biomedical pathways to automatically assign annotated attributes to latent
factors, yielding interpretable results tailored to the corresponding domain of
interest. Evaluations on simulated and real single-cell RNA-seq datasets
demonstrate that our model robustly identifies relevant structure in an
inherently explainable manner, distinguishes technical noise from sources of
biomedical variation, and provides dataset-specific adaptations of existing
pathway annotations. Implementation is available at
https://github.com/MLO-lab/spexlvm.
Related papers
- Inpainting Pathology in Lumbar Spine MRI with Latent Diffusion [4.410798232767917]
We propose an efficient method for inpainting pathological features onto healthy anatomy in MRI.
We evaluate the method's ability to insert disc herniation and central canal stenosis in lumbar spine sagittal T2 MRI.
arXiv Detail & Related papers (2024-06-04T16:47:47Z) - FORESEE: Multimodal and Multi-view Representation Learning for Robust Prediction of Cancer Survival [3.4686401890974197]
We propose a new end-to-end framework, FORESEE, for robustly predicting patient survival by mining multimodal information.
Cross-fusion transformer effectively utilizes features at the cellular level, tissue level, and tumor heterogeneity level to correlate prognosis.
The hybrid attention encoder (HAE) uses the denoising contextual attention module to obtain the contextual relationship features.
We also propose an asymmetrically masked triplet masked autoencoder to reconstruct lost information within modalities.
arXiv Detail & Related papers (2024-05-13T12:39:08Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - FATE: Feature-Agnostic Transformer-based Encoder for learning
generalized embedding spaces in flow cytometry data [4.550634499956126]
We aim at effectively leveraging data with varying features, without the need to constrain the input space to the intersection of potential feature sets.
We propose a novel architecture that can directly process data without the necessity of aligned feature modalities.
The advantages of the model are demonstrated for automatic cancer cell detection in acute myeloid leukemia in flow data.
arXiv Detail & Related papers (2023-11-06T18:06:38Z) - Conditionally Invariant Representation Learning for Disentangling
Cellular Heterogeneity [25.488181126364186]
This paper presents a novel approach that leverages domain variability to learn representations that are conditionally invariant to unwanted variability or distractors.
We apply our method to grand biological challenges, such as data integration in single-cell genomics.
Specifically, the proposed approach helps to disentangle biological signals from data biases that are unrelated to the target task or the causal explanation of interest.
arXiv Detail & Related papers (2023-07-02T12:52:41Z) - DCID: Deep Canonical Information Decomposition [84.59396326810085]
We consider the problem of identifying the signal shared between two one-dimensional target variables.
We propose ICM, an evaluation metric which can be used in the presence of ground-truth labels.
We also propose Deep Canonical Information Decomposition (DCID) - a simple, yet effective approach for learning the shared variables.
arXiv Detail & Related papers (2023-06-27T16:59:06Z) - Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation.
GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z) - RevUp: Revise and Update Information Bottleneck for Event Representation [16.54912614895861]
In machine learning, latent variables play a key role to capture the underlying structure of data, but they are often unsupervised.
We propose a semi-supervised information bottleneck-based model that enables the use of side knowledge to direct the learning of discrete latent variables.
We show that our approach generalizes an existing method of parameter injection, and perform an empirical case study of our approach on language-based event modeling.
arXiv Detail & Related papers (2022-05-24T17:54:59Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Learning Conditional Invariance through Cycle Consistency [60.85059977904014]
We propose a novel approach to identify meaningful and independent factors of variation in a dataset.
Our method involves two separate latent subspaces for the target property and the remaining input information.
We demonstrate on synthetic and molecular data that our approach identifies more meaningful factors which lead to sparser and more interpretable models.
arXiv Detail & Related papers (2021-11-25T17:33:12Z) - InteL-VAEs: Adding Inductive Biases to Variational Auto-Encoders via
Intermediary Latents [60.785317191131284]
We introduce a simple and effective method for learning VAEs with controllable biases by using an intermediary set of latent variables.
In particular, it allows us to impose desired properties like sparsity or clustering on learned representations.
We show that this, in turn, allows InteL-VAEs to learn both better generative models and representations.
arXiv Detail & Related papers (2021-06-25T16:34:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.