Stress-testing cross-cancer generalizability of 3D nnU-Net for PET-CT tumor segmentation: multi-cohort evaluation with novel oesophageal and lung cancer datasets
- URL: http://arxiv.org/abs/2508.18612v1
- Date: Tue, 26 Aug 2025 02:25:53 GMT
- Title: Stress-testing cross-cancer generalizability of 3D nnU-Net for PET-CT tumor segmentation: multi-cohort evaluation with novel oesophageal and lung cancer datasets
- Authors: Soumen Ghosh, Christine Jestin Hannan, Rajat Vashistha, Parveen Kundu, Sandra Brosda, Lauren G. Aoude, James Lonie, Andrew Nathanson, Jessica Ng, Andrew P. Barbour, Viktor Vegh,
- Abstract summary: This study presents the first cross cancer evaluation of nnU-Net on PET-CT.<n>We trained and tested 3D nnUNet models under three paradigms.<n>For the tested sets, the oesophageal only model achieved the best in-domain accuracy (mean DSC, 57.8) but failed on external Indian lung cohort.<n>The public only model generalized more broadly (mean DSC, 63.5 on AutoPET, 51.6 on Indian lung cohort) but underperformed in oesophageal Australian cohort.
- Score: 0.16999906736255554
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Robust generalization is essential for deploying deep learning based tumor segmentation in clinical PET-CT workflows, where anatomical sites, scanners, and patient populations vary widely. This study presents the first cross cancer evaluation of nnU-Net on PET-CT, introducing two novel, expert-annotated whole-body datasets. 279 patients with oesophageal cancer (Australian cohort) and 54 with lung cancer (Indian cohort). These cohorts complement the public AutoPET dataset and enable systematic stress-testing of cross domain performance. We trained and tested 3D nnUNet models under three paradigms. Target only (oesophageal), public only (AutoPET), and combined training. For the tested sets, the oesophageal only model achieved the best in-domain accuracy (mean DSC, 57.8) but failed on external Indian lung cohort (mean DSC less than 3.4), indicating severe overfitting. The public only model generalized more broadly (mean DSC, 63.5 on AutoPET, 51.6 on Indian lung cohort) but underperformed in oesophageal Australian cohort (mean DSC, 26.7). The combined approach provided the most balanced results (mean DSC, lung (52.9), oesophageal (40.7), AutoPET (60.9)), reducing boundary errors and improving robustness across all cohorts. These findings demonstrate that dataset diversity, particularly multi demographic, multi center and multi cancer integration, outweighs architectural novelty as the key driver of robust generalization. This work presents the demography based cross cancer deep learning segmentation evaluation and highlights dataset diversity, rather than model complexity, as the foundation for clinically robust segmentation.
Related papers
- Lung nodule classification on CT scan patches using 3D convolutional neural networks [0.0]
Lung cancer remains one of the most common and deadliest forms of cancer worldwide.<n>Early detection of lung cancer represents a critical medical challenge.<n>This study introduces three methodological improvements for lung nodule classification.<n>The integration of these techniques enables the development of a robust classification subsystem within a comprehensive Clinical Decision Support System for lung cancer detection.
arXiv Detail & Related papers (2026-02-13T09:26:32Z) - LungEvaty: A Scalable, Open-Source Transformer-based Deep Learning Model for Lung Cancer Risk Prediction in LDCT Screening [37.29507297342265]
LungEvaty is a transformer-based framework for predicting 1-6 year lung cancer risk from a single LDCT scan.<n>It learns directly from large-scale screening data to capture comprehensive anatomical and pathological cues relevant for malignancy risk.<n>LungEvaty was trained on more than 90,000 CT scans, including over 28,000 for fine-tuning and 6,000 for evaluation.
arXiv Detail & Related papers (2025-11-25T09:38:10Z) - Transplant-Ready? Evaluating AI Lung Segmentation Models in Candidates with Severe Lung Disease [11.854505363942941]
This study evaluates publicly available deep-learning based lung segmentation models in transplant-eligible patients.<n>Unet-R231 provided the most accurate automated lung segmentation among evaluated models.
arXiv Detail & Related papers (2025-09-18T15:42:43Z) - Cancer-Net PCa-Seg: Benchmarking Deep Learning Models for Prostate Cancer Segmentation Using Synthetic Correlated Diffusion Imaging [65.83291923029985]
Prostate cancer (PCa) is the most prevalent cancer among men in the United States, accounting for nearly 300,000 cases, 29% of all diagnoses and 35,000 total deaths in 2024.<n>Traditional screening methods such as prostate-specific antigen (PSA) testing and magnetic resonance imaging (MRI) have been pivotal in diagnosis, but have faced limitations in specificity and generalizability.<n>We employ several state-of-the-art deep learning models, including U-Net, SegResNet, Swin UNETR, Attention U-Net, and LightM-UNet, to segment prostate glands from a 200 CDI$
arXiv Detail & Related papers (2025-01-15T22:23:41Z) - From FDG to PSMA: A Hitchhiker's Guide to Multitracer, Multicenter Lesion Segmentation in PET/CT Imaging [0.9384264274298444]
We present our solution for the autoPET III challenge, targeting multitracer, multicenter generalization using the nnU-Net framework with the ResEncL architecture.
Key techniques include misalignment data augmentation and multi-modal pretraining across CT, MR, and PET datasets.
Compared to the default nnU-Net, which achieved a Dice score of 57.61, our model significantly improved performance with a Dice score of 68.40, alongside a reduction in false positive (FPvol: 7.82) and false negative (FNvol: 10.35) volumes.
arXiv Detail & Related papers (2024-09-14T16:39:17Z) - Deep Learning-Based Segmentation of Tumors in PET/CT Volumes: Benchmark of Different Architectures and Training Strategies [0.12301374769426145]
This study examines various neural network architectures and training strategies for automatically segmentation of cancer lesions.
V-Net and nnU-Net models were the most effective for their respective datasets.
Eliminating cancer-free cases from the AutoPET dataset was found to improve the performance of most models.
arXiv Detail & Related papers (2024-04-15T13:03:42Z) - Detection of subclinical atherosclerosis by image-based deep learning on chest x-ray [86.38767955626179]
Deep-learning algorithm to predict coronary artery calcium (CAC) score was developed on 460 chest x-ray.
The diagnostic accuracy of the AICAC model assessed by the area under the curve (AUC) was the primary outcome.
arXiv Detail & Related papers (2024-03-27T16:56:14Z) - Quantifying uncertainty in lung cancer segmentation with foundation models applied to mixed-domain datasets [6.712251433139412]
Medical image foundation models have shown the ability to segment organs and tumors with minimal fine-tuning.<n>These models are typically evaluated on task-specific in-distribution (ID) datasets.<n>We introduce a comprehensive set of computationally fast metrics to evaluate the performance of multiple foundation models trained with self-supervised learning (SSL)<n>SMIT produced the highest F1-score (LRAD: 0.60, 5Rater: 0.64) and lowest entropy (LRAD: 0.06, 5Rater: 0.12), indicating higher tumor detection rate and confident segmentations.
arXiv Detail & Related papers (2024-03-19T19:36:48Z) - Joint nnU-Net and Radiomics Approaches for Segmentation and Prognosis of
Head and Neck Cancers with PET/CT images [6.361835964390572]
3D nnU-Net architecture was adopted to automatic segmentation of primary tumor and lymph nodes synchronously.
Three prognostic models were constructed containing conventional and radiomics features alone.
Dice score and C-index were used as evaluation metrics for segmentation and prognosis task.
arXiv Detail & Related papers (2022-11-18T10:31:26Z) - CIRCA: comprehensible online system in support of chest X-rays-based
COVID-19 diagnosis [37.41181188499616]
Deep learning techniques can help in the faster detection of COVID-19 cases and monitoring of disease progression.
Five different datasets were used to construct a representative dataset of 23 799 CXRs for model training.
A U-Net-based model was developed to identify a clinically relevant region of the CXR.
arXiv Detail & Related papers (2022-10-11T13:30:34Z) - Federated Learning Enables Big Data for Rare Cancer Boundary Detection [98.5549882883963]
We present findings from the largest Federated ML study to-date, involving data from 71 healthcare institutions across 6 continents.
We generate an automatic tumor boundary detector for the rare disease of glioblastoma.
We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent.
arXiv Detail & Related papers (2022-04-22T17:27:00Z) - Deep learning-based COVID-19 pneumonia classification using chest CT
images: model generalizability [54.86482395312936]
Deep learning (DL) classification models were trained to identify COVID-19-positive patients on 3D computed tomography (CT) datasets from different countries.
We trained nine identical DL-based classification models by using combinations of the datasets with a 72% train, 8% validation, and 20% test data split.
The models trained on multiple datasets and evaluated on a test set from one of the datasets used for training performed better.
arXiv Detail & Related papers (2021-02-18T21:14:52Z) - CT-based COVID-19 Triage: Deep Multitask Learning Improves Joint
Identification and Severity Quantification [45.86448200141968]
We describe two basic setups: Identification of COVID-19 to prioritize studies of potentially infected patients to isolate them as early as possible; Severity quantification to highlight studies of severe patients and direct them to a hospital or provide emergency medical care.
We propose a multitask approach to consolidate both triage approaches and propose a convolutional neural network to combine all available labels within a single model.
We train our model on approximately 2000 publicly available CT studies and test it with a carefully designed set consisting of 32 COVID-19 studies, 30 cases with bacterial pneumonia, 31 healthy patients, and 30 patients with other lung pathologies to emulate a typical patient flow in
arXiv Detail & Related papers (2020-06-02T08:05:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.