Visualizing hierarchies in scRNA-seq data using a density tree-biased
autoencoder
- URL: http://arxiv.org/abs/2102.05892v3
- Date: Fri, 22 Apr 2022 08:59:25 GMT
- Title: Visualizing hierarchies in scRNA-seq data using a density tree-biased
autoencoder
- Authors: Quentin Garrido (LIGM, HCI), Sebastian Damrich (HCI), Alexander
J\"ager (HCI), Dario Cerletti (HCI), Manfred Claassen, Laurent Najman (LIGM),
Fred Hamprecht (HCI)
- Abstract summary: We propose an approach for identifying a meaningful tree structure from high-dimensional scRNA-seq data.
We then introduce DTAE, a tree-biased autoencoder that emphasizes the tree structure of the data in low dimensional space.
- Score: 50.591267188664666
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivation: Single cell RNA sequencing (scRNA-seq) data makes studying the
development of cells possible at unparalleled resolution. Given that many
cellular differentiation processes are hierarchical, their scRNA-seq data is
expected to be approximately tree-shaped in gene expression space. Inference
and representation of this tree-structure in two dimensions is highly desirable
for biological interpretation and exploratory analysis.Results:Our two
contributions are an approach for identifying a meaningful tree structure from
high-dimensional scRNA-seq data, and a visualization method respecting the
tree-structure. We extract the tree structure by means of a density based
minimum spanning tree on a vector quantization of the data and show that it
captures biological information well. We then introduce DTAE, a tree-biased
autoencoder that emphasizes the tree structure of the data in low dimensional
space. We compare to other dimension reduction methods and demonstrate the
success of our method both qualitatively and quantitatively on real and toy
data.Availability: Our implementation relying on PyTorch and Higra is available
at https://github.com/hci-unihd/DTAE.
Related papers
- scTree: Discovering Cellular Hierarchies in the Presence of Batch Effects in scRNA-seq Data [12.01555110624794]
scTree corrects for batch effects while simultaneously learning a tree-structured data representation.
We show empirically on seven datasets that scTree discovers the underlying clusters of the data.
arXiv Detail & Related papers (2024-06-27T16:16:55Z) - The Central Spanning Tree Problem [20.14154858576556]
Spanning trees are an important primitive in many data analysis tasks, when a data set needs to be summarized in terms of its "skeleton"
We show empirically that the (branched) central spanning tree is more robust to noise in the data, and as such is better suited to summarize a data set in terms of its skeleton.
arXiv Detail & Related papers (2024-04-09T16:49:42Z) - PhyloGFN: Phylogenetic inference with generative flow networks [57.104166650526416]
We introduce the framework of generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and phylogenetic inference.
Because GFlowNets are well-suited for sampling complex structures, they are a natural choice for exploring and sampling from the multimodal posterior distribution over tree topologies.
We demonstrate that our amortized posterior sampler, PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets.
arXiv Detail & Related papers (2023-10-12T23:46:08Z) - Tree Variational Autoencoders [5.992683455757179]
We propose a new generative hierarchical clustering model that learns a flexible tree-based posterior distribution over latent variables.
TreeVAE hierarchically divides samples according to their intrinsic characteristics, shedding light on hidden structures in the data.
arXiv Detail & Related papers (2023-06-15T09:25:04Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - Phylo2Vec: a vector representation for binary trees [0.49478969093606673]
We present Phylo2Vec, a parsimonious encoding for phylogenetic trees.
It serves as a unified approach for both manipulating and representing phylogenetic trees.
As a proof of concept, we use Phylo2Vec for maximum likelihood inference on five real-world datasets.
arXiv Detail & Related papers (2023-04-25T09:54:35Z) - RLET: A Reinforcement Learning Based Approach for Explainable QA with
Entailment Trees [47.745218107037786]
We propose RLET, a Reinforcement Learning based Entailment Tree generation framework.
RLET iteratively performs single step reasoning with sentence selection and deduction generation modules.
Experiments on three settings of the EntailmentBank dataset demonstrate the strength of using RL framework.
arXiv Detail & Related papers (2022-10-31T06:45:05Z) - Spectral Top-Down Recovery of Latent Tree Models [13.681975313065477]
Spectral Top-Down Recovery (STDR) is a divide-and-conquer approach for inference of large latent tree models.
STDR's partitioning step is non-random. Instead, it is based on the Fiedler vector of a suitable Laplacian matrix related to the observed nodes.
We prove that STDR is statistically consistent, and bound the number of samples required to accurately recover the tree with high probability.
arXiv Detail & Related papers (2021-02-26T02:47:42Z) - SGA: A Robust Algorithm for Partial Recovery of Tree-Structured
Graphical Models with Noisy Samples [75.32013242448151]
We consider learning Ising tree models when the observations from the nodes are corrupted by independent but non-identically distributed noise.
Katiyar et al. (2020) showed that although the exact tree structure cannot be recovered, one can recover a partial tree structure.
We propose Symmetrized Geometric Averaging (SGA), a more statistically robust algorithm for partial tree recovery.
arXiv Detail & Related papers (2021-01-22T01:57:35Z) - Uncovering the Folding Landscape of RNA Secondary Structure with Deep
Graph Embeddings [71.20283285671461]
We propose a geometric scattering autoencoder (GSAE) network for learning such graph embeddings.
Our embedding network first extracts rich graph features using the recently proposed geometric scattering transform.
We show that GSAE organizes RNA graphs both by structure and energy, accurately reflecting bistable RNA structures.
arXiv Detail & Related papers (2020-06-12T00:17:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.