Distribution-based Sketching of Single-Cell Samples
- URL: http://arxiv.org/abs/2207.00584v1
- Date: Thu, 30 Jun 2022 19:43:06 GMT
- Title: Distribution-based Sketching of Single-Cell Samples
- Authors: Vishal Athreya Baskaran, Jolene Ranek, Siyuan Shan, Natalie Stanley,
Junier B. Oliva
- Abstract summary: We propose a novel sketching approach based on Kernel Herding that selects a limited subsample of all cells while preserving the underlying frequencies of immune cell-types.
We tested our approach on three flow and mass datasets and on one single-cell RNA sequencing dataset.
- Score: 6.904244323294012
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Modern high-throughput single-cell immune profiling technologies, such as
flow and mass cytometry and single-cell RNA sequencing can readily measure the
expression of a large number of protein or gene features across the millions of
cells in a multi-patient cohort. While bioinformatics approaches can be used to
link immune cell heterogeneity to external variables of interest, such as,
clinical outcome or experimental label, they often struggle to accommodate such
a large number of profiled cells. To ease this computational burden, a limited
number of cells are typically \emph{sketched} or subsampled from each patient.
However, existing sketching approaches fail to adequately subsample rare cells
from rare cell-populations, or fail to preserve the true frequencies of
particular immune cell-types. Here, we propose a novel sketching approach based
on Kernel Herding that selects a limited subsample of all cells while
preserving the underlying frequencies of immune cell-types. We tested our
approach on three flow and mass cytometry datasets and on one single-cell RNA
sequencing dataset and demonstrate that the sketched cells (1) more accurately
represent the overall cellular landscape and (2) facilitate increased
performance in downstream analysis tasks, such as classifying patients
according to their clinical outcome. An implementation of sketching with Kernel
Herding is publicly available at
\url{https://github.com/vishalathreya/Set-Summarization}.
Related papers
- Generating Multi-Modal and Multi-Attribute Single-Cell Counts with CFGen [76.02070962797794]
We present Cell Flow for Generation, a flow-based conditional generative model for multi-modal single-cell counts.
Our results suggest improved recovery of crucial biological data characteristics while accounting for novel generative tasks.
arXiv Detail & Related papers (2024-07-16T14:05:03Z) - MMIL: A novel algorithm for disease associated cell type discovery [58.044870442206914]
Single-cell datasets often lack individual cell labels, making it challenging to identify cells associated with disease.
We introduce Mixture Modeling for Multiple Learning Instance (MMIL), an expectation method that enables the training and calibration of cell-level classifiers.
arXiv Detail & Related papers (2024-06-12T15:22:56Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Single-cell Multi-view Clustering via Community Detection with Unknown
Number of Clusters [64.31109141089598]
We introduce scUNC, an innovative multi-view clustering approach tailored for single-cell data.
scUNC seamlessly integrates information from different views without the need for a predefined number of clusters.
We conducted a comprehensive evaluation of scUNC using three distinct single-cell datasets.
arXiv Detail & Related papers (2023-11-28T08:34:58Z) - Mixed Models with Multiple Instance Learning [51.440557223100164]
We introduce MixMIL, a framework integrating Generalized Linear Mixed Models (GLMM) and Multiple Instance Learning (MIL)
Our empirical results reveal that MixMIL outperforms existing MIL models in single-cell datasets.
arXiv Detail & Related papers (2023-11-04T16:42:42Z) - Machine Learning for Flow Cytometry Data Analysis [0.0]
Flow cytometers can rapidly analyse tens of thousands of cells at the same time while also measuring multiple parameters from a single cell.
Researchers need to be able to distinguish interesting-looking cell populations manually in multi-dimensional data collected from millions of cells.
Three representative automated clustering algorithms are selected to be applied, compared and evaluated by completely and partially automated gating.
arXiv Detail & Related papers (2023-03-16T00:43:46Z) - A biology-driven deep generative model for cell-type annotation in
cytometry [0.0]
We introduce Scyan, a Single-cell Cytometry Network that automatically annotates cell types using only prior expert knowledge.
Scyan significantly outperforms the related state-of-the-art models on multiple public datasets while being faster and interpretable.
In addition, Scyan overcomes several complementary tasks such as batch-effect removal, debarcoding, and population discovery.
arXiv Detail & Related papers (2022-08-11T10:50:44Z) - Interpretable Single-Cell Set Classification with Kernel Mean Embeddings [14.686560033030101]
Kernel Mean Embedding encodes the cellular landscape of each profiled biological sample.
We train a simple linear classifier and achieve state-of-the-art classification accuracy on 3 flow and mass datasets.
arXiv Detail & Related papers (2022-01-18T21:40:36Z) - Contrastive Cycle Adversarial Autoencoders for Single-cell Multi-omics
Alignment and Integration [0.0]
We propose a novel framework to align and integrate single-cell RNA-seq data and single-cell ATAC-seq data.
Compared with the other state-of-the-art methods, our method performs better in both simulated and real single-cell data.
arXiv Detail & Related papers (2021-12-05T13:00:58Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z) - Automated Phenotyping via Cell Auto Training (CAT) on the Cell DIVE
Platform [0.5599792629509229]
We present a method for automatic cell classification in tissue samples using an automated training set from multiplexed immunofluorescence images.
The method utilizes multiple markers stained in situ on a single tissue section on a robust hyperplex immunofluorescence platform.
arXiv Detail & Related papers (2020-07-18T16:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.