Interpretable Single-Cell Set Classification with Kernel Mean Embeddings
- URL: http://arxiv.org/abs/2201.07322v1
- Date: Tue, 18 Jan 2022 21:40:36 GMT
- Title: Interpretable Single-Cell Set Classification with Kernel Mean Embeddings
- Authors: Siyuan Shan, Vishal Baskaran, Haidong Yi, Jolene Ranek, Natalie
Stanley, Junier Oliva
- Abstract summary: Kernel Mean Embedding encodes the cellular landscape of each profiled biological sample.
We train a simple linear classifier and achieve state-of-the-art classification accuracy on 3 flow and mass datasets.
- Score: 14.686560033030101
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Modern single-cell flow and mass cytometry technologies measure the
expression of several proteins of the individual cells within a blood or tissue
sample. Each profiled biological sample is thus represented by a set of
hundreds of thousands of multidimensional cell feature vectors, which incurs a
high computational cost to predict each biological sample's associated
phenotype with machine learning models. Such a large set cardinality also
limits the interpretability of machine learning models due to the difficulty in
tracking how each individual cell influences the ultimate prediction. Using
Kernel Mean Embedding to encode the cellular landscape of each profiled
biological sample, we can train a simple linear classifier and achieve
state-of-the-art classification accuracy on 3 flow and mass cytometry datasets.
Our model contains few parameters but still performs similarly to deep learning
models with millions of parameters. In contrast with deep learning approaches,
the linearity and sub-selection step of our model make it easy to interpret
classification results. Clustering analysis further shows that our method
admits rich biological interpretability for linking cellular heterogeneity to
clinical phenotype.
Related papers
- MMIL: A novel algorithm for disease associated cell type discovery [58.044870442206914]
Single-cell datasets often lack individual cell labels, making it challenging to identify cells associated with disease.
We introduce Mixture Modeling for Multiple Learning Instance (MMIL), an expectation method that enables the training and calibration of cell-level classifiers.
arXiv Detail & Related papers (2024-06-12T15:22:56Z) - FlowCyt: A Comparative Study of Deep Learning Approaches for Multi-Class Classification in Flow Cytometry Benchmarking [1.6712896227173808]
FlowCyt is the first comprehensive benchmark for multi-class single-cell classification in flowencoded data.
The dataset comprises bone marrow samples from 30 patients, with each cell characterized by twelve markers.
arXiv Detail & Related papers (2024-02-28T15:01:59Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Mixed Models with Multiple Instance Learning [51.440557223100164]
We introduce MixMIL, a framework integrating Generalized Linear Mixed Models (GLMM) and Multiple Instance Learning (MIL)
Our empirical results reveal that MixMIL outperforms existing MIL models in single-cell datasets.
arXiv Detail & Related papers (2023-11-04T16:42:42Z) - Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges.
We first present the model that underlies most of current causal approaches to single-cell biology.
We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z) - VOLTA: an Environment-Aware Contrastive Cell Representation Learning for
Histopathology [0.3436781233454516]
We propose a self-supervised framework (VOLTA) for cell representation learning in histopathology images.
We subjected our model to extensive experiments on the data collected from multiple institutions around the world.
To showcase the potential power of our proposed framework, we applied VOLTA to ovarian and endometrial cancers with very small sample sizes.
arXiv Detail & Related papers (2023-03-08T16:35:47Z) - Machine learning based lens-free imaging technique for field-portable
cytometry [0.0]
The performance of our proposed method shows an increase in accuracy >98% along with the signal enhancement of >5 dB for most of the cell types.
The model is adaptive to learn new type of samples within a few learning iterations and able to successfully classify the newly introduced sample.
arXiv Detail & Related papers (2022-03-02T07:09:29Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z) - Sickle-cell disease diagnosis support selecting the most appropriate
machinelearning method: Towards a general and interpretable approach for
cellmorphology analysis from microscopy images [0.0]
We propose an approach to select the classification method and features, based on the state-of-the-art.
We used samples of patients with sickle-cell disease which can be generalized for other study cases.
arXiv Detail & Related papers (2020-10-09T11:46:38Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Distinguishing Cell Phenotype Using Cell Epigenotype [0.0]
Relationship between microscopic observations and macroscopic behavior is a fundamental open question in biophysical systems.
We develop a unified approach that---in contrast with existing methods---predicts cell type from macromolecular data even when accounting for the scale of human tissue diversity and limitations in the available data.
arXiv Detail & Related papers (2020-03-20T18:00:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.