Hierarchical novel class discovery for single-cell transcriptomic profiles
- URL: http://arxiv.org/abs/2409.05937v1
- Date: Mon, 9 Sep 2024 16:49:09 GMT
- Title: Hierarchical novel class discovery for single-cell transcriptomic profiles
- Authors: Malek Senoussi, Thierry Artières, Paul Villoutreix,
- Abstract summary: We focus on datasets obtained in the context of developmental biology, where the differentiation process leads to a hierarchical structure.
We consider a frequent setting where both labeled and unlabeled data are available at training time, but the sets of the labels of labeled data on one side and of the unlabeled data on the other side, are disjoint.
The goal is to achieve two objectives, clustering the data and mapping the clusters with labels. We propose extensions of k-Means and GMM clustering methods for solving the problem and report comparative results on artificial and experimental transcriptomic datasets.
- Score: 1.6385815610837167
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the major challenges arising from single-cell transcriptomics experiments is the question of how to annotate the associated single-cell transcriptomic profiles. Because of the large size and the high dimensionality of the data, automated methods for annotation are needed. We focus here on datasets obtained in the context of developmental biology, where the differentiation process leads to a hierarchical structure. We consider a frequent setting where both labeled and unlabeled data are available at training time, but the sets of the labels of labeled data on one side and of the unlabeled data on the other side, are disjoint. It is an instance of the Novel Class Discovery problem. The goal is to achieve two objectives, clustering the data and mapping the clusters with labels. We propose extensions of k-Means and GMM clustering methods for solving the problem and report comparative results on artificial and experimental transcriptomic datasets. Our approaches take advantage of the hierarchical nature of the data.
Related papers
- Constructing Cell-type Taxonomy by Optimal Transport with Relaxed Marginal Constraints [14.831346286039151]
One challenge in the cluster analysis of cells is matching clusters extracted from datasets of different origins or conditions.
Our approach aims to construct a taxonomy for cell clusters across all samples to better annotate these clusters and effectively extract features for downstream analysis.
arXiv Detail & Related papers (2025-01-29T21:29:25Z) - UniCell: Universal Cell Nucleus Classification via Prompt Learning [76.11864242047074]
We propose a universal cell nucleus classification framework (UniCell)
It employs a novel prompt learning mechanism to uniformly predict the corresponding categories of pathological images from different dataset domains.
In particular, our framework adopts an end-to-end architecture for nuclei detection and classification, and utilizes flexible prediction heads for adapting various datasets.
arXiv Detail & Related papers (2024-02-20T11:50:27Z) - Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task.
We propose a co-training-based framework that encourages clustering consistency.
Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z) - Dynamic Conceptional Contrastive Learning for Generalized Category
Discovery [76.82327473338734]
Generalized category discovery (GCD) aims to automatically cluster partially labeled data.
Unlabeled data contain instances that are not only from known categories of the labeled data but also from novel categories.
One effective way for GCD is applying self-supervised learning to learn discriminate representation for unlabeled data.
We propose a Dynamic Conceptional Contrastive Learning framework, which can effectively improve clustering accuracy.
arXiv Detail & Related papers (2023-03-30T14:04:39Z) - Automatic universal taxonomies for multi-domain semantic segmentation [1.4364491422470593]
Training semantic segmentation models on multiple datasets has sparked a lot of recent interest in the computer vision community.
established datasets have mutually incompatible labels which disrupt principled inference in the wild.
We address this issue by automatic construction of universal through iterative dataset integration.
arXiv Detail & Related papers (2022-07-18T08:53:17Z) - Contrastive Cycle Adversarial Autoencoders for Single-cell Multi-omics
Alignment and Integration [0.0]
We propose a novel framework to align and integrate single-cell RNA-seq data and single-cell ATAC-seq data.
Compared with the other state-of-the-art methods, our method performs better in both simulated and real single-cell data.
arXiv Detail & Related papers (2021-12-05T13:00:58Z) - Coarse2Fine: Fine-grained Text Classification on Coarsely-grained
Annotated Data [22.81068960545234]
We introduce a new problem called coarse-to-fine grained classification, which aims to perform fine-grained classification on coarsely annotated data.
Instead of asking for new fine-grained human annotations, we opt to leverage label surface names as the only human guidance.
Our framework uses the fine-tuned generative models to sample pseudo-training data for training the classifier, and bootstraps on real unlabeled data for model refinement.
arXiv Detail & Related papers (2021-09-22T17:29:01Z) - MATCH: Metadata-Aware Text Classification in A Large Hierarchy [60.59183151617578]
MATCH is an end-to-end framework that leverages both metadata and hierarchy information.
We propose different ways to regularize the parameters and output probability of each child label by its parents.
Experiments on two massive text datasets with large-scale label hierarchies demonstrate the effectiveness of MATCH.
arXiv Detail & Related papers (2021-02-15T05:23:08Z) - Classify and Generate Reciprocally: Simultaneous Positive-Unlabelled
Learning and Conditional Generation with Extra Data [77.31213472792088]
The scarcity of class-labeled data is a ubiquitous bottleneck in many machine learning problems.
We address this problem by leveraging Positive-Unlabeled(PU) classification and the conditional generation with extra unlabeled data.
We present a novel training framework to jointly target both PU classification and conditional generation when exposed to extra data.
arXiv Detail & Related papers (2020-06-14T08:27:40Z) - Automatically Discovering and Learning New Visual Categories with
Ranking Statistics [145.89790963544314]
We tackle the problem of discovering novel classes in an image collection given labelled examples of other classes.
We learn a general-purpose clustering model and use the latter to identify the new classes in the unlabelled data.
We evaluate our approach on standard classification benchmarks and outperform current methods for novel category discovery by a significant margin.
arXiv Detail & Related papers (2020-02-13T18:53:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.