A tissue and cell-level annotated H&E and PD-L1 histopathology image dataset in non-small cell lung cancer
- URL: http://arxiv.org/abs/2507.16855v1
- Date: Mon, 21 Jul 2025 12:16:22 GMT
- Title: A tissue and cell-level annotated H&E and PD-L1 histopathology image dataset in non-small cell lung cancer
- Authors: Joey Spronck, Leander van Eekelen, Dominique van Midden, Joep Bogaerts, Leslie Tessier, Valerie Dechering, Muradije Demirel-Andishmand, Gabriel Silva de Souza, Roland Nemeth, Enrico Munari, Giuseppe Bogina, Ilaria Girolami, Albino Eccher, Balazs Acs, Ceren Boyaci, Natalie Klubickova, Monika Looijen-Salamon, Shoko Vos, Francesco Ciompi,
- Abstract summary: IGNITE dataset is a multi-stain, multi-centric, and multi-scanner dataset of annotated NSCLC whole-slide images.<n>This dataset includes 887 fully annotated regions of interest from 155 unique patients across three complementary tasks.<n>To the best of our knowledge, this is the first public NSCLC dataset with manual annotations of H&E in metastatic sites and PD-L1 IHC.
- Score: 0.7400138614614626
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The tumor immune microenvironment (TIME) in non-small cell lung cancer (NSCLC) histopathology contains morphological and molecular characteristics predictive of immunotherapy response. Computational quantification of TIME characteristics, such as cell detection and tissue segmentation, can support biomarker development. However, currently available digital pathology datasets of NSCLC for the development of cell detection or tissue segmentation algorithms are limited in scope, lack annotations of clinically prevalent metastatic sites, and forgo molecular information such as PD-L1 immunohistochemistry (IHC). To fill this gap, we introduce the IGNITE data toolkit, a multi-stain, multi-centric, and multi-scanner dataset of annotated NSCLC whole-slide images. We publicly release 887 fully annotated regions of interest from 155 unique patients across three complementary tasks: (i) multi-class semantic segmentation of tissue compartments in H&E-stained slides, with 16 classes spanning primary and metastatic NSCLC, (ii) nuclei detection, and (iii) PD-L1 positive tumor cell detection in PD-L1 IHC slides. To the best of our knowledge, this is the first public NSCLC dataset with manual annotations of H&E in metastatic sites and PD-L1 IHC.
Related papers
- MIPHEI-ViT: Multiplex Immunofluorescence Prediction from H&E Images using ViT Foundation Models [2.2802322828460864]
We introduce MIPHEI, a U-Net-inspired architecture that integrates state-of-the-art ViT foundation models as encoders to predict mIF signals from H&E images.<n>We train our model using the publicly available ORION dataset of restained H&E and mIF images from colorectal cancer tissue.<n>MIPHEI achieves accurate cell-type classification from H&E alone, with F1 scores of 0.88 for Pan-CK, 0.57 for CD3e, 0.56 for SMA, 0.36 for CD68, and 0.30 for CD20.
arXiv Detail & Related papers (2025-05-15T13:42:48Z) - Histo-Miner: Deep Learning based Tissue Features Extraction Pipeline from H&E Whole Slide Images of Cutaneous Squamous Cell Carcinoma [31.25944547782148]
Histo-Miner is a deep learning pipeline for analysis of Whole-Slide Images (WSIs) of skin tissue.<n>We develop our pipeline for the analysis of patient samples of cutaneous squamous cell carcinoma (c SCC)<n>Histo-Miner employs convolutional neural networks and vision transformers for nucleus segmentation and classification as well as tumor region segmentation.
arXiv Detail & Related papers (2025-05-07T09:34:03Z) - Comprehensive Pathological Image Segmentation via Teacher Aggregation for Tumor Microenvironment Analysis [0.15206737182982252]
PAGET (Pathological image segmentation via AGgrEgated Teachers) is a new knowledge distillation approach that integrates multiple segmentation models.<n>We demonstrate PAGET's ability to perform rapid, comprehensive TME segmentation across various tissue types and medical institutions.
arXiv Detail & Related papers (2025-01-06T10:33:14Z) - Multi-modal Medical Image Fusion For Non-Small Cell Lung Cancer Classification [7.002657345547741]
Non-small cell lung cancer (NSCLC) is a predominant cause of cancer mortality worldwide.
In this paper, we introduce an innovative integration of multi-modal data, synthesizing fused medical imaging (CT and PET scans) with clinical health records and genomic data.
Our research surpasses existing approaches, as evidenced by a substantial enhancement in NSCLC detection and classification precision.
arXiv Detail & Related papers (2024-09-27T12:59:29Z) - CIMIL-CRC: a clinically-informed multiple instance learning framework for patient-level colorectal cancer molecular subtypes classification from H\&E stained images [42.771819949806655]
We introduce CIMIL-CRC', a framework that solves the MSI/MSS MIL problem by efficiently combining a pre-trained feature extraction model with principal component analysis (PCA) to aggregate information from all patches.
We assessed our CIMIL-CRC method using the average area under the curve (AUC) from a 5-fold cross-validation experimental setup for model development on the TCGA-CRC-DX cohort.
arXiv Detail & Related papers (2024-01-29T12:56:11Z) - Classification of lung cancer subtypes on CT images with synthetic
pathological priors [41.75054301525535]
Cross-scale associations exist in the image patterns between the same case's CT images and its pathological images.
We propose self-generating hybrid feature network (SGHF-Net) for accurately classifying lung cancer subtypes on CT images.
arXiv Detail & Related papers (2023-08-09T02:04:05Z) - Meta-information-aware Dual-path Transformer for Differential Diagnosis
of Multi-type Pancreatic Lesions in Multi-phase CT [41.199716328468895]
We develop a dual-path transformer to exploit the feasibility of classification and segmentation of pancreatic lesions.
The proposed method consists of a CNN-based segmentation path (S-path) and a transformer-based classification path (C-path)
Our results show that our method can enable accurate classification and segmentation of the full taxonomy of pancreatic lesions.
arXiv Detail & Related papers (2023-03-02T03:34:28Z) - Lung Cancer Lesion Detection in Histopathology Images Using Graph-Based
Sparse PCA Network [93.22587316229954]
We propose a graph-based sparse principal component analysis (GS-PCA) network, for automated detection of cancerous lesions on histological lung slides stained by hematoxylin and eosin (H&E)
We evaluate the performance of the proposed algorithm on H&E slides obtained from an SVM K-rasG12D lung cancer mouse model using precision/recall rates, F-score, Tanimoto coefficient, and area under the curve (AUC) of the receiver operator characteristic (ROC)
arXiv Detail & Related papers (2021-10-27T19:28:36Z) - M3Lung-Sys: A Deep Learning System for Multi-Class Lung Pneumonia
Screening from CT Imaging [85.00066186644466]
We propose a Multi-task Multi-slice Deep Learning System (M3Lung-Sys) for multi-class lung pneumonia screening from CT imaging.
In addition to distinguish COVID-19 from Healthy, H1N1, and CAP cases, our M 3 Lung-Sys also be able to locate the areas of relevant lesions.
arXiv Detail & Related papers (2020-10-07T06:22:24Z) - DLBCL-Morph: Morphological features computed using deep learning for an
annotated digital DLBCL image set [3.5947673199446935]
Diffuse Large B-Cell Lymphoma (DLBCL) is the most common non-Hodgkin lymphoma.
No morphologic features have been consistently demonstrated to correlate with prognosis.
We present a morphologic analysis of histology sections from 209 DLBCL cases with associated clinical and cytogenetic data.
arXiv Detail & Related papers (2020-09-17T07:43:42Z) - Co-Heterogeneous and Adaptive Segmentation from Multi-Source and
Multi-Phase CT Imaging Data: A Study on Pathological Liver and Lesion
Segmentation [48.504790189796836]
We present a novel segmentation strategy, co-heterogenous and adaptive segmentation (CHASe)
We propose a versatile framework that fuses appearance based semi-supervision, mask based adversarial domain adaptation, and pseudo-labeling.
CHASe can further improve pathological liver mask Dice-Sorensen coefficients by ranges of $4.2% sim 9.4%$.
arXiv Detail & Related papers (2020-05-27T06:58:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.