A Multicentric Dataset for Training and Benchmarking Breast Cancer Segmentation in H&E Slides
- URL: http://arxiv.org/abs/2510.02037v1
- Date: Thu, 02 Oct 2025 14:09:21 GMT
- Title: A Multicentric Dataset for Training and Benchmarking Breast Cancer Segmentation in H&E Slides
- Authors: Carlijn Lems, Leslie Tessier, John-Melle Bokhorst, Mart van Rijthoven, Witali Aswolinskiy, Matteo Pozzi, Natalie Klubickova, Suzanne Dintzis, Michela Campora, Maschenka Balkenhol, Peter Bult, Joey Spronck, Thomas Detone, Mattia Barbareschi, Enrico Munari, Giuseppe Bogina, Jelle Wesseling, Esther H. Lips, Francesco Ciompi, Frédérique Meeuwsen, Jeroen van der Laak,
- Abstract summary: We introduce BrEast cancEr hisTopathoLogy sEgmentation (BEETLE), a dataset for multiclass semantic segmentation of H&E-stained breast cancer WSIs.<n>It consists of 587 biopsies and resections from three collaborating clinical centers and two public datasets, digitized using seven scanners, and covers all molecular subtypes and histological grades.<n>The dataset's diversity and relevance to the rapidly growing field of automated biomarker quantification in breast cancer ensure its high potential for reuse.
- Score: 1.2783652545738993
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Automated semantic segmentation of whole-slide images (WSIs) stained with hematoxylin and eosin (H&E) is essential for large-scale artificial intelligence-based biomarker analysis in breast cancer. However, existing public datasets for breast cancer segmentation lack the morphological diversity needed to support model generalizability and robust biomarker validation across heterogeneous patient cohorts. We introduce BrEast cancEr hisTopathoLogy sEgmentation (BEETLE), a dataset for multiclass semantic segmentation of H&E-stained breast cancer WSIs. It consists of 587 biopsies and resections from three collaborating clinical centers and two public datasets, digitized using seven scanners, and covers all molecular subtypes and histological grades. Using diverse annotation strategies, we collected annotations across four classes - invasive epithelium, non-invasive epithelium, necrosis, and other - with particular focus on morphologies underrepresented in existing datasets, such as ductal carcinoma in situ and dispersed lobular tumor cells. The dataset's diversity and relevance to the rapidly growing field of automated biomarker quantification in breast cancer ensure its high potential for reuse. Finally, we provide a well-curated, multicentric external evaluation set to enable standardized benchmarking of breast cancer segmentation models.
Related papers
- A General Model for Retinal Segmentation and Quantification [40.957724455503346]
We present RetSAM, a general retinal segmentation and quantification framework for fundus imaging.<n>It delivers robust multi-target segmentation and standardized biomarker extraction, supporting downstream ophthalmologic studies and oculomics correlation analyses.<n>The resulting biomarkers enable systematic correlation analyses across major ophthalmic diseases, including diabetic retinopathy, age-related macular degeneration, glaucoma, and myopia.
arXiv Detail & Related papers (2026-01-31T10:24:02Z) - MoXGATE: Modality-aware cross-attention for multi-omic gastrointestinal cancer sub-type classification [7.7134821078470965]
MoXGATE is a novel deep-learning framework that captures inter-modality dependencies, ensuring robust and interpretable integration.<n>We demonstrate that MoXGATE outperforms existing methods, achieving 95% classification accuracy.<n>Key contributions include (1) a cross-attention-based multi-omic integration framework, (2) modality-weighted fusion for enhanced interpretability, and (3) application of focal loss to mitigate data imbalance.
arXiv Detail & Related papers (2025-06-08T03:42:23Z) - Multimodal Deep Learning for Subtype Classification in Breast Cancer Using Histopathological Images and Gene Expression Data [0.28675177318965045]
We propose a deep multimodal learning framework to classify breast cancer into BRCA. Luminal and BRCA.Basal / Her2 subtypes.<n>Our approach employs a ResNet-50 model for image feature extraction and fully connected layers for gene expression processing.<n>Our findings highlight the potential of deep learning for robust and interpretable breast cancer subtype classification.
arXiv Detail & Related papers (2025-03-04T18:24:33Z) - FECT: Classification of Breast Cancer Pathological Images Based on Fusion Features [1.9356426053533178]
We propose a novel breast cancer tissue classification model that Fused features of Edges, Cells, and Tissues (FECT)<n>Our model surpasses current advanced methods in terms of classification accuracy and F1 scores.<n>Our model exhibits interpretability and holds promise for significant roles in future clinical applications.
arXiv Detail & Related papers (2025-01-17T11:32:33Z) - LASSO-MOGAT: A Multi-Omics Graph Attention Framework for Cancer Classification [41.94295877935867]
This paper introduces LASSO-MOGAT, a graph-based deep learning framework that integrates messenger RNA, microRNA, and DNA methylation data to classify 31 cancer types.
arXiv Detail & Related papers (2024-08-30T16:26:04Z) - Screen Them All: High-Throughput Pan-Cancer Genetic and Phenotypic Biomarker Screening from H&E Whole Slide Images [10.358246499005062]
OmniScreen is an AI-based system leveraging Virchow2 embeddings extracted from 60,529 cancer patients.<n>It employs a unified model to predict a broad range of clinically relevant biomarkers across cancers.<n>It reliably identifies therapeutic targets and shared phenotypic features across common and rare tumors.
arXiv Detail & Related papers (2024-08-18T17:44:00Z) - Mask-Enhanced Segment Anything Model for Tumor Lesion Semantic Segmentation [48.107348956719775]
We introduce Mask-Enhanced SAM (M-SAM), an innovative architecture tailored for 3D tumor lesion segmentation.
We propose a novel Mask-Enhanced Adapter (MEA) within M-SAM that enriches the semantic information of medical images with positional data from coarse segmentation masks.
Our M-SAM achieves high segmentation accuracy and also exhibits robust generalization.
arXiv Detail & Related papers (2024-03-09T13:37:02Z) - Cancer Gene Profiling through Unsupervised Discovery [49.28556294619424]
We introduce a novel, automatic and unsupervised framework to discover low-dimensional gene biomarkers.
Our method is based on the LP-Stability algorithm, a high dimensional center-based unsupervised clustering algorithm.
Our signature reports promising results on distinguishing immune inflammatory and immune desert tumors.
arXiv Detail & Related papers (2021-02-11T09:04:45Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Topological Data Analysis of copy number alterations in cancer [70.85487611525896]
We explore the potential to capture information contained in cancer genomic information using a novel topology-based approach.
We find that this technique has the potential to extract meaningful low-dimensional representations in cancer somatic genetic data.
arXiv Detail & Related papers (2020-11-22T17:31:23Z) - Co-Heterogeneous and Adaptive Segmentation from Multi-Source and
Multi-Phase CT Imaging Data: A Study on Pathological Liver and Lesion
Segmentation [48.504790189796836]
We present a novel segmentation strategy, co-heterogenous and adaptive segmentation (CHASe)
We propose a versatile framework that fuses appearance based semi-supervision, mask based adversarial domain adaptation, and pseudo-labeling.
CHASe can further improve pathological liver mask Dice-Sorensen coefficients by ranges of $4.2% sim 9.4%$.
arXiv Detail & Related papers (2020-05-27T06:58:39Z) - The scalable Birth-Death MCMC Algorithm for Mixed Graphical Model
Learning with Application to Genomic Data Integration [0.0]
We propose a novel mixed graphical model approach to analyze multi-omic data of different types.
We find that our method is superior in terms of both computational efficiency and the accuracy of the model selection results.
arXiv Detail & Related papers (2020-05-08T16:34:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.