Towards Comprehensive Cellular Characterisation of H&E slides
- URL: http://arxiv.org/abs/2508.09926v3
- Date: Tue, 02 Sep 2025 15:30:58 GMT
- Title: Towards Comprehensive Cellular Characterisation of H&E slides
- Authors: Benjamin Adjadj, Pierre-Antoine Bannier, Guillaume Horent, Sebastien Mandela, Aurore Lyon, Kathryn Schutte, Ulysse Marteau, Valentin Gaury, Laura Dumont, Thomas Mathieu, MOSAIC consortium, Reda Belbahri, BenoƮt Schmauch, Eric Durand, Katharina Von Loga, Lucie Gillet,
- Abstract summary: HistoPLUS is a state-of-the-art model for cell analysis.<n>It trains on a novel curated pan-cancer dataset of 108,722 nuclei covering 13 cell types.<n>It outperforms current state-of-the-art models in detection quality by 5.2% and overall F1 classification score by 23.7%.
- Score: 0.27993409178463413
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cell detection, segmentation and classification are essential for analyzing tumor microenvironments (TME) on hematoxylin and eosin (H&E) slides. Existing methods suffer from poor performance on understudied cell types (rare or not present in public datasets) and limited cross-domain generalization. To address these shortcomings, we introduce HistoPLUS, a state-of-the-art model for cell analysis, trained on a novel curated pan-cancer dataset of 108,722 nuclei covering 13 cell types. In external validation across 4 independent cohorts, HistoPLUS outperforms current state-of-the-art models in detection quality by 5.2% and overall F1 classification score by 23.7%, while using 5x fewer parameters. Notably, HistoPLUS unlocks the study of 7 understudied cell types and brings significant improvements on 8 of 13 cell types. Moreover, we show that HistoPLUS robustly transfers to two oncology indications unseen during training. To support broader TME biomarker research, we release the model weights and inference code at https://github.com/owkin/histoplus/.
Related papers
- Glioma C6: A Novel Dataset for Training and Benchmarking Cell Segmentation [0.0]
We present Glioma C6, a new open dataset for instance segmentation of glioma C6 cells.<n>The dataset comprises 75 high-resolution phase-contrast microscopy images with over 12,000 annotated cells.<n>It includes soma annotations and morphological cell categorization provided by biologists.
arXiv Detail & Related papers (2025-11-10T16:33:34Z) - From Linear Probing to Joint-Weighted Token Hierarchy: A Foundation Model Bridging Global and Cellular Representations in Biomarker Detection [44.3895875409365]
AI-based biomarkers can infer molecular features directly from hematoxylin & eosin (H&E) slides.<n>Most pathology foundation models (PFMs) rely on global patch-level embeddings and overlook cell-level morphology.<n>We present a PFM model, JWTH, which integrates large-scale self-supervised pretraining with cell-centric post-tuning and attention pooling to fuse local and global tokens.
arXiv Detail & Related papers (2025-11-07T11:05:36Z) - Boosting Pathology Foundation Models via Few-shot Prompt-tuning for Rare Cancer Subtyping [80.92960114162746]
We propose PathPT, a novel framework that exploits the potential of vision-language pathology foundation models.<n>PathPT converts WSI-level supervision into fine-grained tile-level guidance by leveraging the zero-shot capabilities of VL models.<n>Results show that PathPT consistently delivers superior performance, achieving substantial gains in subtyping accuracy and cancerous region grounding ability.
arXiv Detail & Related papers (2025-08-21T18:04:41Z) - CellVerse: Do Large Language Models Really Understand Cell Biology? [74.34984441715517]
We introduce CellVerse, a unified language-centric question-answering benchmark that integrates four types of single-cell multi-omics data.<n>We systematically evaluate the performance across 14 open-source and closed-source LLMs ranging from 160M to 671B on CellVerse.
arXiv Detail & Related papers (2025-05-09T06:47:23Z) - Interpretable Graph Kolmogorov-Arnold Networks for Multi-Cancer Classification and Biomarker Identification using Multi-Omics Data [36.92842246372894]
Multi-Omics Graph Kolmogorov-Arnold Network (MOGKAN) is a deep learning framework that utilizes messenger-RNA, micro-RNA sequences, and DNA methylation samples.<n>By integrating multi-omics data with graph-based deep learning, our proposed approach demonstrates robust predictive performance and interpretability.
arXiv Detail & Related papers (2025-03-29T02:14:05Z) - Graph Structure Learning for Tumor Microenvironment with Cell Type Annotation from non-spatial scRNA-seq data [6.432270457083369]
We present a novel graph neural network (GNN) model that enhances cell type prediction and cell interaction analysis.<n>The proposed scGSL model demonstrated robust performance, achieving an average accuracy of 84.83%, precision of 86.23%, recall of 81.51%, and an F1 score of 80.92% across all datasets.
arXiv Detail & Related papers (2025-02-04T18:28:25Z) - CIMIL-CRC: a clinically-informed multiple instance learning framework for patient-level colorectal cancer molecular subtypes classification from H\&E stained images [42.771819949806655]
We introduce CIMIL-CRC', a framework that solves the MSI/MSS MIL problem by efficiently combining a pre-trained feature extraction model with principal component analysis (PCA) to aggregate information from all patches.
We assessed our CIMIL-CRC method using the average area under the curve (AUC) from a 5-fold cross-validation experimental setup for model development on the TCGA-CRC-DX cohort.
arXiv Detail & Related papers (2024-01-29T12:56:11Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Immunohistochemistry guided segmentation of benign epithelial cells, in situ lesions, and invasive epithelial cells in breast cancer slides [0.3251634769699391]
We developed an AI model for segmentation of epithelial cells in sections from breast cancer.
Quantitative evaluation, a mean Dice score of 0.70, 0.79, and 0.75 for invasive epithelial cells, benign epithelial cells, and in situ lesions, respectively, were achieved.
arXiv Detail & Related papers (2023-11-22T09:25:08Z) - CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection [36.08551407926805]
We propose the CLIP-Driven Universal Model, which incorporates text embedding learned from Contrastive Language-Image Pre-training to segmentation models.
The proposed model is developed from an assembly of 14 datasets, using a total of 3,410 CT scans for training and then evaluated on 6,162 external CT scans from 3 additional datasets.
arXiv Detail & Related papers (2023-01-02T18:07:44Z) - Hierarchical Phenotyping and Graph Modeling of Spatial Architecture in
Lymphoid Neoplasms [7.229065627904531]
This study is among the first to hybrid local and global graph methods to profile orchestration and interaction of cellular components.
The proposed algorithm achieves a mean diagnosis accuracy of 0.703 with the repeated 5-fold cross-validation scheme.
arXiv Detail & Related papers (2021-06-30T16:09:32Z) - Cell abundance aware deep learning for cell detection on highly
imbalanced pathological data [0.0]
In digital pathology, less abundant cell types can be of biological significance.
We proposed a deep learning pipeline that considers the abundance of cell types during model training.
We found that scaling deep learning loss function by the abundance of cells improves cell detection performance.
arXiv Detail & Related papers (2021-02-23T13:07:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.