Uncovering smooth structures in single-cell data with PCS-guided neighbor embeddings
- URL: http://arxiv.org/abs/2506.22228v1
- Date: Fri, 27 Jun 2025 13:45:55 GMT
- Title: Uncovering smooth structures in single-cell data with PCS-guided neighbor embeddings
- Authors: Rong Ma, Xi Li, Jingyuan Hu, Bin Yu,
- Abstract summary: Single-cell sequencing is revolutionizing biology by enabling detailed investigations of cell-state transitions.<n>It remains challenging to extract smooth, low-dimensional representations from noisy, high-dimensional single-cell data.<n>We introduce NESS, a principled and interpretable machine learning approach to improve NE representations.
- Score: 14.708144124501635
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Single-cell sequencing is revolutionizing biology by enabling detailed investigations of cell-state transitions. Many biological processes unfold along continuous trajectories, yet it remains challenging to extract smooth, low-dimensional representations from inherently noisy, high-dimensional single-cell data. Neighbor embedding (NE) algorithms, such as t-SNE and UMAP, are widely used to embed high-dimensional single-cell data into low dimensions. But they often introduce undesirable distortions, resulting in misleading interpretations. Existing evaluation methods for NE algorithms primarily focus on separating discrete cell types rather than capturing continuous cell-state transitions, while dynamic modeling approaches rely on strong assumptions about cellular processes and specialized data. To address these challenges, we build on the Predictability-Computability-Stability (PCS) framework for reliable and reproducible data-driven discoveries. First, we systematically evaluate popular NE algorithms through empirical analysis, simulation, and theory, and reveal their key shortcomings, such as artifacts and instability. We then introduce NESS, a principled and interpretable machine learning approach to improve NE representations by leveraging algorithmic stability and to enable robust inference of smooth biological structures. NESS offers useful concepts, quantitative stability metrics, and efficient computational workflows to uncover developmental trajectories and cell-state transitions in single-cell data. Finally, we apply NESS to six single-cell datasets, spanning pluripotent stem cell differentiation, organoid development, and multiple tissue-specific lineage trajectories. Across these diverse contexts, NESS consistently yields useful biological insights, such as identification of transitional and stable cell states and quantification of transcriptional dynamics during development.
Related papers
- NOBLE -- Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models [68.89389652724378]
NOBLE is a neural operator framework that learns a mapping from a continuous frequency-modulated embedding of interpretable neuron features to the somatic voltage response induced by current injection.<n>It predicts distributions of neural dynamics accounting for the intrinsic experimental variability.<n>NOBLE is the first scaled-up deep learning framework validated on real experimental data.
arXiv Detail & Related papers (2025-06-05T01:01:18Z) - Inferring stochastic dynamics with growth from cross-sectional data [3.3748750222488657]
We present a novel approach, emphunbalanced probability flow inference, that addresses the challenge for biological processes modelled as dynamics with growth.<n>By leveraging a Lagrangian formulation of the Fokker-Planck equation, our method accurately disentangles drift from intrinsic noise and growth.
arXiv Detail & Related papers (2025-05-19T14:51:47Z) - A scalable gene network model of regulatory dynamics in single cells [88.48246132084441]
We introduce a Functional Learnable model of Cell dynamicS, FLeCS, that incorporates gene network structure into coupled differential equations to model gene regulatory functions.<n>Given (pseudo)time-series single-cell data, FLeCS accurately infers cell dynamics at scale.
arXiv Detail & Related papers (2025-03-25T19:19:21Z) - Cell as Point: One-Stage Framework for Efficient Cell Tracking [54.19259129722988]
We propose a novel end-to-end one-stage framework that reimagines cell tracking by treating Cell as Point.<n>Unlike traditional methods, CAP eliminates the need for explicit detection or segmentation, instead jointly tracking cells for sequences in one stage.<n> CAP demonstrates promising cell tracking performance and is 10 to 55 times more efficient than existing methods.
arXiv Detail & Related papers (2024-11-22T10:16:35Z) - Multi-Modal and Multi-Attribute Generation of Single Cells with CFGen [76.02070962797794]
This work introduces CellFlow for Generation (CFGen), a flow-based conditional generative model that preserves the inherent discreteness of single-cell data.<n>CFGen generates whole-genome multi-modal single-cell data reliably, improving the recovery of crucial biological data characteristics.
arXiv Detail & Related papers (2024-07-16T14:05:03Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - GENOT: Entropic (Gromov) Wasserstein Flow Matching with Applications to Single-Cell Genomics [20.01834405021846]
Single-cell genomics has advanced our understanding of cellular behavior, catalyzing innovations in treatments and precision medicine.
Traditional discrete solvers are hampered by scalability, privacy, and out-of-sample estimation issues.
We present a neural network-based solvers, known as neural OT solvers, that parameterize OT maps.
We demonstrate its versatility and robustness through applications in cell development studies, cellular drug response modeling, and cross-modality cell translation.
arXiv Detail & Related papers (2023-10-13T17:12:04Z) - Is your data alignable? Principled and interpretable alignability
testing and integration of single-cell data [24.457344926393397]
Single-cell data integration can provide a comprehensive molecular view of cells.
Existing methods suffer from several fundamental limitations.
We present a spectral manifold alignment and inference framework.
arXiv Detail & Related papers (2023-08-03T16:04:14Z) - PhagoStat a scalable and interpretable end to end framework for
efficient quantification of cell phagocytosis in neurodegenerative disease
studies [0.0]
We introduce an end-to-end, scalable, and versatile real-time framework for quantifying and analyzing phagocytic activity.
Our proposed pipeline is able to process large data-sets and includes a data quality verification module.
We apply our pipeline to analyze microglial cell phagocytosis in FTD and obtain statistically reliable results.
arXiv Detail & Related papers (2023-04-26T18:10:35Z) - Learning Causal Representations of Single Cells via Sparse Mechanism
Shift Modeling [3.2435888122704037]
We propose a deep generative model of single-cell gene expression data for which each perturbation is treated as an intervention targeting an unknown, but sparse, subset of latent variables.
We benchmark these methods on simulated single-cell data to evaluate their performance at latent units recovery, causal target identification and out-of-domain generalization.
arXiv Detail & Related papers (2022-11-07T15:47:40Z) - Learning Neural Causal Models with Active Interventions [83.44636110899742]
We introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process.
Our method significantly reduces the required number of interactions compared with random intervention targeting.
We demonstrate superior performance on multiple benchmarks from simulated to real-world data.
arXiv Detail & Related papers (2021-09-06T13:10:37Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.