Is your data alignable? Principled and interpretable alignability
testing and integration of single-cell data
- URL: http://arxiv.org/abs/2308.01839v2
- Date: Thu, 29 Feb 2024 22:35:45 GMT
- Title: Is your data alignable? Principled and interpretable alignability
testing and integration of single-cell data
- Authors: Rong Ma, Eric D. Sun, David Donoho and James Zou
- Abstract summary: Single-cell data integration can provide a comprehensive molecular view of cells.
Existing methods suffer from several fundamental limitations.
We present a spectral manifold alignment and inference framework.
- Score: 24.457344926393397
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Single-cell data integration can provide a comprehensive molecular view of
cells, and many algorithms have been developed to remove unwanted technical or
biological variations and integrate heterogeneous single-cell datasets. Despite
their wide usage, existing methods suffer from several fundamental limitations.
In particular, we lack a rigorous statistical test for whether two
high-dimensional single-cell datasets are alignable (and therefore should even
be aligned). Moreover, popular methods can substantially distort the data
during alignment, making the aligned data and downstream analysis difficult to
interpret. To overcome these limitations, we present a spectral manifold
alignment and inference (SMAI) framework, which enables principled and
interpretable alignability testing and structure-preserving integration of
single-cell data with the same type of features. SMAI provides a statistical
test to robustly assess the alignability between datasets to avoid misleading
inference, and is justified by high-dimensional statistical theory. On a
diverse range of real and simulated benchmark datasets, it outperforms commonly
used alignment methods. Moreover, we show that SMAI improves various downstream
analyses such as identification of differentially expressed genes and
imputation of single-cell spatial transcriptomics, providing further biological
insights. SMAI's interpretability also enables quantification and a deeper
understanding of the sources of technical confounders in single-cell data.
Related papers
- CAVACHON: a hierarchical variational autoencoder to integrate multi-modal single-cell data [10.429856767305687]
We propose a novel probabilistic learning framework that explicitly incorporates conditional independence relationships between multi-modal data.
We demonstrate the versatility of our framework across various applications pertinent to single-cell multi-omics data integration.
arXiv Detail & Related papers (2024-05-28T23:44:09Z) - Scalable Amortized GPLVMs for Single Cell Transcriptomics Data [9.010523724015398]
Dimensionality reduction is crucial for analyzing large-scale single-cell RNA-seq data.
We introduce an improved model, the amortized variational model (BGPLVM)
BGPLVM is tailored for single-cell RNA-seq with specialized encoder, kernel, and likelihood designs.
arXiv Detail & Related papers (2024-05-06T21:54:38Z) - Physics-informed and Unsupervised Riemannian Domain Adaptation for Machine Learning on Heterogeneous EEG Datasets [53.367212596352324]
We propose an unsupervised approach leveraging EEG signal physics.
We map EEG channels to fixed positions using field, source-free domain adaptation.
Our method demonstrates robust performance in brain-computer interface (BCI) tasks and potential biomarker applications.
arXiv Detail & Related papers (2024-03-07T16:17:33Z) - UniCell: Universal Cell Nucleus Classification via Prompt Learning [76.11864242047074]
We propose a universal cell nucleus classification framework (UniCell)
It employs a novel prompt learning mechanism to uniformly predict the corresponding categories of pathological images from different dataset domains.
In particular, our framework adopts an end-to-end architecture for nuclei detection and classification, and utilizes flexible prediction heads for adapting various datasets.
arXiv Detail & Related papers (2024-02-20T11:50:27Z) - Mixed Models with Multiple Instance Learning [51.440557223100164]
We introduce MixMIL, a framework integrating Generalized Linear Mixed Models (GLMM) and Multiple Instance Learning (MIL)
Our empirical results reveal that MixMIL outperforms existing MIL models in single-cell datasets.
arXiv Detail & Related papers (2023-11-04T16:42:42Z) - AVIDA: Alternating method for Visualizing and Integrating Data [1.6637373649145604]
AVIDA is a framework for simultaneously performing data alignment and dimension reduction.
We show that AVIDA correctly aligns high-dimensional datasets without common features.
In general applications, other methods can be used for the alignment and dimension reduction modules.
arXiv Detail & Related papers (2022-05-31T22:36:10Z) - Scalable Regularised Joint Mixture Models [2.0686407686198263]
In many applications, data can be heterogeneous in the sense of spanning latent groups with different underlying distributions.
We propose an approach for heterogeneous data that allows joint learning of (i) explicit multivariate feature distributions, (ii) high-dimensional regression models and (iii) latent group labels.
The approach is demonstrably effective in high dimensions, combining data reduction for computational efficiency with a re-weighting scheme that retains key signals even when the number of features is large.
arXiv Detail & Related papers (2022-05-03T13:38:58Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Contrastive Cycle Adversarial Autoencoders for Single-cell Multi-omics
Alignment and Integration [0.0]
We propose a novel framework to align and integrate single-cell RNA-seq data and single-cell ATAC-seq data.
Compared with the other state-of-the-art methods, our method performs better in both simulated and real single-cell data.
arXiv Detail & Related papers (2021-12-05T13:00:58Z) - General stochastic separation theorems with optimal bounds [68.8204255655161]
Phenomenon of separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities.
Errors or clusters of errors can be separated from the rest of the data.
The ability to correct an AI system also opens up the possibility of an attack on it, and the high dimensionality induces vulnerabilities caused by the same separability.
arXiv Detail & Related papers (2020-10-11T13:12:41Z) - Bayesian Sparse Factor Analysis with Kernelized Observations [67.60224656603823]
Multi-view problems can be faced with latent variable models.
High-dimensionality and non-linear issues are traditionally handled by kernel methods.
We propose merging both approaches into single model.
arXiv Detail & Related papers (2020-06-01T14:25:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.