Linearized Optimal Transport for Analysis of High-Dimensional Point-Cloud and Single-Cell Data
- URL: http://arxiv.org/abs/2510.22033v2
- Date: Wed, 29 Oct 2025 23:56:23 GMT
- Title: Linearized Optimal Transport for Analysis of High-Dimensional Point-Cloud and Single-Cell Data
- Authors: Tianxiang Wang, Yingtong Ke, Dhananjay Bhaskar, Smita Krishnaswamy, Alexander Cloninger,
- Abstract summary: Single-cell technologies generate high-dimensional point clouds of cells.<n>Each patient is represented by an irregular point cloud rather than a simple vector.<n>We adapt the Linear Optimal Transport framework to embed irregular point clouds into a fixed-dimensional Euclidean space.
- Score: 45.87606039212519
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Single-cell technologies generate high-dimensional point clouds of cells, enabling detailed characterization of complex patient states and treatment responses. Yet each patient is represented by an irregular point cloud rather than a simple vector, making it difficult to directly quantify and compare biological differences between individuals. Nonlinear methods such as kernels and neural networks achieve predictive accuracy but act as black boxes, offering little biological interpretability. To address these limitations, we adapt the Linear Optimal Transport (LOT) framework to this setting, embedding irregular point clouds into a fixed-dimensional Euclidean space while preserving distributional structure. This embedding provides a principled linear representation that preserves optimal transport geometry while enabling downstream analysis. It also forms a registration between any two patients, enabling direct comparison of their cellular distributions. Within this space, LOT enables: (i) \textbf{accurate and interpretable classification} of COVID-19 patient states, where classifier weights map back to specific markers and spatial regions driving predictions; and (ii) \textbf{synthetic data generation} for patient-derived organoids, exploiting the linearity of the LOT embedding. LOT barycenters yield averaged cellular profiles representing combined conditions or samples, supporting drug interaction testing. Together, these results establish LOT as a unified framework that bridges predictive performance, interpretability, and generative modeling. By transforming heterogeneous point clouds into structured embeddings directly traceable to the original data, LOT opens new opportunities for understanding immune variation and treatment effects in high-dimensional biological systems.
Related papers
- An AI-enabled tool for quantifying overlapping red blood cell sickling dynamics in microfluidic assays [5.577003343220155]
This framework integrates AI-assisted annotation, segmentation, classification, and instance counting to quantify red blood cell populations.<n>It can more than double the experimental throughput via densely packed cell suspensions, capture drug-dependent sickling behavior, and reveal mechanobiological signatures of cellular morphological evolution.
arXiv Detail & Related papers (2026-01-25T05:32:53Z) - Departures: Distributional Transport for Single-Cell Perturbation Prediction with Neural Schrödinger Bridges [51.83259180910313]
A major bottleneck in gene function analysis is the unpaired nature of single-cell data.<n>We approximate Schrdinger Bridge (SB) to tackle unpaired single-cell perturbation data.<n>Our model effectively captures heterogeneous single-cell responses and achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-11-17T08:27:13Z) - Uncovering smooth structures in single-cell data with PCS-guided neighbor embeddings [14.708144124501635]
Single-cell sequencing is revolutionizing biology by enabling detailed investigations of cell-state transitions.<n>It remains challenging to extract smooth, low-dimensional representations from noisy, high-dimensional single-cell data.<n>We introduce NESS, a principled and interpretable machine learning approach to improve NE representations.
arXiv Detail & Related papers (2025-06-27T13:45:55Z) - Unlasting: Unpaired Single-Cell Multi-Perturbation Estimation by Dual Conditional Diffusion Implicit Bridges [68.98973318553983]
We propose a framework based on Dual Diffusion Implicit Bridges (DDIB) to learn the mapping between different data distributions.<n>We integrate gene regulatory network (GRN) information to propagate perturbation signals in a biologically meaningful way.<n>We also incorporate a masking mechanism to predict silent genes, improving the quality of generated profiles.
arXiv Detail & Related papers (2025-06-26T09:05:38Z) - Fourier Asymmetric Attention on Domain Generalization for Pan-Cancer Drug Response Prediction [11.649397977546435]
We propose a novel domain generalization framework, termed FourierDrug, to address this challenge.<n>Our experiments demonstrate that our model effectively learns task-relevant features from diverse source domains, and accurate predictions of drug response for unseen cancer type.<n>These findings underscore the potential of our method for real-world clinical applications.
arXiv Detail & Related papers (2025-02-06T12:53:45Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Is your data alignable? Principled and interpretable alignability
testing and integration of single-cell data [24.457344926393397]
Single-cell data integration can provide a comprehensive molecular view of cells.
Existing methods suffer from several fundamental limitations.
We present a spectral manifold alignment and inference framework.
arXiv Detail & Related papers (2023-08-03T16:04:14Z) - Batch Normalization in Cytometry Data by kNN-Graph Preservation [0.0]
Batch effects in high-dimensional Cytometry by Time-of-Flight (CyTOF) data pose a challenge for comparative analysis.<n>Traditional batch normalization methods may fail to preserve the complex topological structures inherent in cellular populations.<n>We present a residual neural network-based method for point set registration specifically tailored to address batch normalization in CyTOF data.
arXiv Detail & Related papers (2023-03-31T18:06:26Z) - Score-based Causal Representation Learning with Interventions [54.735484409244386]
This paper studies the causal representation learning problem when latent causal variables are observed indirectly.
The objectives are: (i) recovering the unknown linear transformation (up to scaling) and (ii) determining the directed acyclic graph (DAG) underlying the latent variables.
arXiv Detail & Related papers (2023-01-19T18:39:48Z) - Modelling Technical and Biological Effects in scRNA-seq data with
Scalable GPLVMs [6.708052194104378]
We extend a popular approach for probabilistic non-linear dimensionality reduction, the Gaussian process latent variable model, to scale to massive single-cell datasets.
The key idea is to use an augmented kernel which preserves the factorisability of the lower bound allowing for fast variational inference.
arXiv Detail & Related papers (2022-09-14T15:25:15Z) - Weakly-Supervised Cross-Domain Adaptation for Endoscopic Lesions
Segmentation [79.58311369297635]
We propose a new weakly-supervised lesions transfer framework, which can explore transferable domain-invariant knowledge across different datasets.
A Wasserstein quantified transferability framework is developed to highlight widerange transferable contextual dependencies.
A novel self-supervised pseudo label generator is designed to equally provide confident pseudo pixel labels for both hard-to-transfer and easy-to-transfer target samples.
arXiv Detail & Related papers (2020-12-08T02:26:03Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.