Related papers: Visualizing hierarchies in scRNA-seq data using a density tree-biased autoencoder

Visualizing hierarchies in scRNA-seq data using a density tree-biased autoencoder

URL: http://arxiv.org/abs/2102.05892v3
Date: Fri, 22 Apr 2022 08:59:25 GMT
Title: Visualizing hierarchies in scRNA-seq data using a density tree-biased autoencoder
Authors: Quentin Garrido (LIGM, HCI), Sebastian Damrich (HCI), Alexander J\"ager (HCI), Dario Cerletti (HCI), Manfred Claassen, Laurent Najman (LIGM), Fred Hamprecht (HCI)
Abstract summary: We propose an approach for identifying a meaningful tree structure from high-dimensional scRNA-seq data. We then introduce DTAE, a tree-biased autoencoder that emphasizes the tree structure of the data in low dimensional space.
Score: 50.591267188664666
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Motivation: Single cell RNA sequencing (scRNA-seq) data makes studying the development of cells possible at unparalleled resolution. Given that many cellular differentiation processes are hierarchical, their scRNA-seq data is expected to be approximately tree-shaped in gene expression space. Inference and representation of this tree-structure in two dimensions is highly desirable for biological interpretation and exploratory analysis.Results:Our two contributions are an approach for identifying a meaningful tree structure from high-dimensional scRNA-seq data, and a visualization method respecting the tree-structure. We extract the tree structure by means of a density based minimum spanning tree on a vector quantization of the data and show that it captures biological information well. We then introduce DTAE, a tree-biased autoencoder that emphasizes the tree structure of the data in low dimensional space. We compare to other dimension reduction methods and demonstrate the success of our method both qualitatively and quantitatively on real and toy data.Availability: Our implementation relying on PyTorch and Higra is available at https://github.com/hci-unihd/DTAE.

Related papers

Hierarchical Quantized Diffusion Based Tree Generation Method for Hierarchical Representation and Lineage Analysis [49.00783841494125]
HDTree captures tree relationships within a hierarchical latent space using a unified hierarchical codebook and quantized diffusion processes.<n> HDTree's effectiveness is demonstrated through comparisons on both general-purpose and single-cell datasets.<n>These contributions provide a new tool for hierarchical lineage analysis, enabling more accurate and efficient modeling of cellular differentiation paths.
arXiv Detail & Related papers (2025-06-29T15:19:13Z)
PhyloGen: Language Model-Enhanced Phylogenetic Inference via Graph Structure Generation [50.80441546742053]
Phylogenetic trees elucidate evolutionary relationships among species. Traditional Markov Chain Monte Carlo methods face slow convergence and computational burdens. We propose PhyloGen, a novel method leveraging a pre-trained genomic language model.
arXiv Detail & Related papers (2024-12-25T08:33:05Z)
scTree: Discovering Cellular Hierarchies in the Presence of Batch Effects in scRNA-seq Data [12.01555110624794]
scTree corrects for batch effects while simultaneously learning a tree-structured data representation. We show empirically on seven datasets that scTree discovers the underlying clusters of the data.
arXiv Detail & Related papers (2024-06-27T16:16:55Z)
The Central Spanning Tree Problem [20.14154858576556]
Spanning trees are an important primitive in many data analysis tasks, when a data set needs to be summarized in terms of its "skeleton" We show empirically that the (branched) central spanning tree is more robust to noise in the data, and as such is better suited to summarize a data set in terms of its skeleton.
arXiv Detail & Related papers (2024-04-09T16:49:42Z)
PhyloGFN: Phylogenetic inference with generative flow networks [57.104166650526416]
We introduce the framework of generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and phylogenetic inference. Because GFlowNets are well-suited for sampling complex structures, they are a natural choice for exploring and sampling from the multimodal posterior distribution over tree topologies. We demonstrate that our amortized posterior sampler, PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets.
arXiv Detail & Related papers (2023-10-12T23:46:08Z)
Tree Variational Autoencoders [5.992683455757179]
We propose a new generative hierarchical clustering model that learns a flexible tree-based posterior distribution over latent variables. TreeVAE hierarchically divides samples according to their intrinsic characteristics, shedding light on hidden structures in the data.
arXiv Detail & Related papers (2023-06-15T09:25:04Z)
Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure. We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance. We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z)
Phylo2Vec: a vector representation for binary trees [0.49478969093606673]
We present Phylo2Vec, a parsimonious encoding for phylogenetic trees. It serves as a unified approach for both manipulating and representing phylogenetic trees. As a proof of concept, we use Phylo2Vec for maximum likelihood inference on five real-world datasets.
arXiv Detail & Related papers (2023-04-25T09:54:35Z)
RLET: A Reinforcement Learning Based Approach for Explainable QA with Entailment Trees [47.745218107037786]
We propose RLET, a Reinforcement Learning based Entailment Tree generation framework. RLET iteratively performs single step reasoning with sentence selection and deduction generation modules. Experiments on three settings of the EntailmentBank dataset demonstrate the strength of using RL framework.
arXiv Detail & Related papers (2022-10-31T06:45:05Z)
Spectral Top-Down Recovery of Latent Tree Models [13.681975313065477]
Spectral Top-Down Recovery (STDR) is a divide-and-conquer approach for inference of large latent tree models. STDR's partitioning step is non-random. Instead, it is based on the Fiedler vector of a suitable Laplacian matrix related to the observed nodes. We prove that STDR is statistically consistent, and bound the number of samples required to accurately recover the tree with high probability.
arXiv Detail & Related papers (2021-02-26T02:47:42Z)
SGA: A Robust Algorithm for Partial Recovery of Tree-Structured Graphical Models with Noisy Samples [75.32013242448151]
We consider learning Ising tree models when the observations from the nodes are corrupted by independent but non-identically distributed noise. Katiyar et al. (2020) showed that although the exact tree structure cannot be recovered, one can recover a partial tree structure. We propose Symmetrized Geometric Averaging (SGA), a more statistically robust algorithm for partial tree recovery.
arXiv Detail & Related papers (2021-01-22T01:57:35Z)
Uncovering the Folding Landscape of RNA Secondary Structure with Deep Graph Embeddings [71.20283285671461]
We propose a geometric scattering autoencoder (GSAE) network for learning such graph embeddings. Our embedding network first extracts rich graph features using the recently proposed geometric scattering transform. We show that GSAE organizes RNA graphs both by structure and energy, accurately reflecting bistable RNA structures.
arXiv Detail & Related papers (2020-06-12T00:17:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.