Efficient Imputation for Patch-based Missing Single-cell Data via Cluster-regularized Optimal Transport
- URL: http://arxiv.org/abs/2601.14653v1
- Date: Wed, 21 Jan 2026 04:58:13 GMT
- Title: Efficient Imputation for Patch-based Missing Single-cell Data via Cluster-regularized Optimal Transport
- Authors: Yuyu Liu, Jiannan Yang, Ziyang Yu, Weishen Pan, Fei Wang, Tengfei Ma,
- Abstract summary: We present CROT, an optimal transport-based imputation algorithm designed to handle patch-based missing data.<n>Our approach effectively captures the underlying data structure in the presence of significant missingness.<n>This work introduces a robust solution for imputation in heterogeneous, high-dimensional datasets with structured data absence.
- Score: 11.748577799315191
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Missing data in single-cell sequencing datasets poses significant challenges for extracting meaningful biological insights. However, existing imputation approaches, which often assume uniformity and data completeness, struggle to address cases with large patches of missing data. In this paper, we present CROT, an optimal transport-based imputation algorithm designed to handle patch-based missing data in tabular formats. Our approach effectively captures the underlying data structure in the presence of significant missingness. Notably, it achieves superior imputation accuracy while significantly reducing runtime, demonstrating its scalability and efficiency for large-scale datasets. This work introduces a robust solution for imputation in heterogeneous, high-dimensional datasets with structured data absence, addressing critical challenges in both biological and clinical data analysis. Our code is available at Anomalous Github.
Related papers
- Kernel Representation and Similarity Measure for Incomplete Data [55.62595187178638]
Measuring similarity between incomplete data is a fundamental challenge in web mining, recommendation systems, and user behavior analysis.<n>Traditional approaches either discard incomplete data or perform imputation as a preprocessing step, leading to information loss and biased similarity estimates.<n>This paper presents a new similarity measure that directly computes similarity between incomplete data in kernel feature space without explicit imputation in the original space.
arXiv Detail & Related papers (2025-10-15T09:41:23Z) - Precision Adaptive Imputation Network : An Unified Technique for Mixed Datasets [0.0]
This study introduces the Precision Adaptive Imputation Network (PAIN), a novel algorithm designed to enhance data reconstruction.<n>PAIN employs a tri-step process that integrates statistical methods, random forests, and autoencoders, ensuring balanced accuracy and efficiency in imputation.<n>The findings highlight PAIN's superior ability to preserve data distributions and maintain analytical integrity, particularly in complex scenarios where missingness is not completely at random.
arXiv Detail & Related papers (2025-01-18T06:22:27Z) - Weighted Diversified Sampling for Efficient Data-Driven Single-Cell Gene-Gene Interaction Discovery [56.622854875204645]
We present an innovative approach utilizing data-driven computational tools, leveraging an advanced Transformer model, to unearth gene-gene interactions.
A novel weighted diversified sampling algorithm computes the diversity score of each data sample in just two passes of the dataset.
arXiv Detail & Related papers (2024-10-21T03:35:23Z) - Missing Data Imputation With Granular Semantics and AI-driven Pipeline for Bankruptcy Prediction [0.34530027457862006]
This work focuses on designing a pipeline for the prediction of bankruptcy.
The presence of missing values, high dimensional data, and highly class-imbalance databases are the major challenges in the said task.
A new method for missing data imputation with granular semantics has been introduced here.
arXiv Detail & Related papers (2024-03-15T13:01:09Z) - Optimal Transport for Structure Learning Under Missing Data [31.240965564055138]
We propose a score-based algorithm for learning causal structures from missing data based on optimal transport.
Our framework is shown to recover the true causal structure more effectively than competing methods in most simulations and real-data settings.
arXiv Detail & Related papers (2024-02-23T10:49:04Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Chains of Autoreplicative Random Forests for missing value imputation in
high-dimensional datasets [1.5076964620370268]
Missing values are a common problem in data science and machine learning.
We consider missing value imputation as a multi-label classification problem and propose Chains of Autoreplicative Random Forests.
Our algorithm effectively imputes missing values based only on information from the dataset.
arXiv Detail & Related papers (2023-01-02T10:53:52Z) - MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms [82.90843777097606]
We propose a causally-aware imputation algorithm (MIRACLE) for missing data.
MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism.
We conduct extensive experiments on synthetic and a variety of publicly available datasets to show that MIRACLE is able to consistently improve imputation.
arXiv Detail & Related papers (2021-11-04T22:38:18Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Establishing strong imputation performance of a denoising autoencoder in
a wide range of missing data problems [0.0]
We develop a consistent framework for both training and imputation.
We benchmarked the results against state-of-the-art imputation methods.
The developed autoencoder obtained the smallest error for all ranges of initial data corruption.
arXiv Detail & Related papers (2020-04-06T12:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.