HiCat: A Semi-Supervised Approach for Cell Type Annotation
- URL: http://arxiv.org/abs/2412.06805v1
- Date: Mon, 25 Nov 2024 03:42:56 GMT
- Title: HiCat: A Semi-Supervised Approach for Cell Type Annotation
- Authors: Chang Bi, Kailun Bai, Xing Li, Xuekui Zhang,
- Abstract summary: HiCat is a semi-supervised pipeline for annotating cell types from single-cell RNA sequencing data.
It fuses the strengths of supervised learning for known cell types with unsupervised learning to identify novel types.
When benchmarked on 10 publicly available genomic datasets, HiCat surpasses other methods.
- Score: 7.9354799442899635
- License:
- Abstract: We introduce HiCat (Hybrid Cell Annotation using Transformative embeddings), a novel semi-supervised pipeline for annotating cell types from single-cell RNA sequencing data. HiCat fuses the strengths of supervised learning for known cell types with unsupervised learning to identify novel types. This hybrid approach incorporates both reference and query genomic data for feature engineering, enhancing the embedding learning process, increasing the effective sample size for unsupervised techniques, and improving the transferability of the supervised model trained on reference data when applied to query datasets. The pipeline follows six key steps: (1) removing batch effects using Harmony to generate a 50-dimensional principal component embedding; (2) applying UMAP for dimensionality reduction to two dimensions to capture crucial data patterns; (3) conducting unsupervised clustering of cells with DBSCAN, yielding a one-dimensional cluster membership vector; (4) merging the multi-resolution results of the previous steps into a 53-dimensional feature space that encompasses both reference and query data; (5) training a CatBoost model on the reference dataset to predict cell types in the query dataset; and (6) resolving inconsistencies between the supervised predictions and unsupervised cluster labels. When benchmarked on 10 publicly available genomic datasets, HiCat surpasses other methods, particularly in differentiating and identifying multiple new cell types. Its capacity to accurately classify novel cell types showcases its robustness and adaptability within intricate biological datasets.
Related papers
- Constructing Cell-type Taxonomy by Optimal Transport with Relaxed Marginal Constraints [14.831346286039151]
One challenge in the cluster analysis of cells is matching clusters extracted from datasets of different origins or conditions.
Our approach aims to construct a taxonomy for cell clusters across all samples to better annotate these clusters and effectively extract features for downstream analysis.
arXiv Detail & Related papers (2025-01-29T21:29:25Z) - Lower-dimensional projections of cellular expression improves cell type classification from single-cell RNA sequencing [12.66369956714212]
Single-cell RNA sequencing (scRNA-seq) enables the study of cellular diversity at single cell level.
Various statistical, machine and deep learning-based methods have been proposed for cell-type classification.
In this work, we proposed a reference-based method for cell type classification, called EnProCell.
arXiv Detail & Related papers (2024-10-13T19:01:38Z) - Hierarchical novel class discovery for single-cell transcriptomic profiles [1.6385815610837167]
We focus on datasets obtained in the context of developmental biology, where the differentiation process leads to a hierarchical structure.
We consider a frequent setting where both labeled and unlabeled data are available at training time, but the sets of the labels of labeled data on one side and of the unlabeled data on the other side, are disjoint.
The goal is to achieve two objectives, clustering the data and mapping the clusters with labels. We propose extensions of k-Means and GMM clustering methods for solving the problem and report comparative results on artificial and experimental transcriptomic datasets.
arXiv Detail & Related papers (2024-09-09T16:49:09Z) - UniCell: Universal Cell Nucleus Classification via Prompt Learning [76.11864242047074]
We propose a universal cell nucleus classification framework (UniCell)
It employs a novel prompt learning mechanism to uniformly predict the corresponding categories of pathological images from different dataset domains.
In particular, our framework adopts an end-to-end architecture for nuclei detection and classification, and utilizes flexible prediction heads for adapting various datasets.
arXiv Detail & Related papers (2024-02-20T11:50:27Z) - scBiGNN: Bilevel Graph Representation Learning for Cell Type
Classification from Single-cell RNA Sequencing Data [62.87454293046843]
Graph neural networks (GNNs) have been widely used for automatic cell type classification.
scBiGNN comprises two GNN modules to identify cell types.
scBiGNN outperforms a variety of existing methods for cell type classification from scRNA-seq data.
arXiv Detail & Related papers (2023-12-16T03:54:26Z) - Single-cell Multi-view Clustering via Community Detection with Unknown
Number of Clusters [64.31109141089598]
We introduce scUNC, an innovative multi-view clustering approach tailored for single-cell data.
scUNC seamlessly integrates information from different views without the need for a predefined number of clusters.
We conducted a comprehensive evaluation of scUNC using three distinct single-cell datasets.
arXiv Detail & Related papers (2023-11-28T08:34:58Z) - Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task.
We propose a co-training-based framework that encourages clustering consistency.
Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z) - Multi-modal Self-supervised Pre-training for Regulatory Genome Across
Cell Types [75.65676405302105]
We propose a simple yet effective approach for pre-training genome data in a multi-modal and self-supervised manner, which we call GeneBERT.
We pre-train our model on the ATAC-seq dataset with 17 million genome sequences.
arXiv Detail & Related papers (2021-10-11T12:48:44Z) - Split and Expand: An inference-time improvement for Weakly Supervised
Cell Instance Segmentation [71.50526869670716]
We propose a two-step post-processing procedure, Split and Expand, to improve the conversion of segmentation maps to instances.
In the Split step, we split clumps of cells from the segmentation map into individual cell instances with the guidance of cell-center predictions.
In the Expand step, we find missing small cells using the cell-center predictions.
arXiv Detail & Related papers (2020-07-21T14:05:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.