PETWB-REP: A Multi-Cancer Whole-Body FDG PET/CT and Radiology Report Dataset for Medical Imaging Research
- URL: http://arxiv.org/abs/2511.03194v1
- Date: Wed, 05 Nov 2025 05:13:57 GMT
- Title: PETWB-REP: A Multi-Cancer Whole-Body FDG PET/CT and Radiology Report Dataset for Medical Imaging Research
- Authors: Le Xue, Gang Feng, Wenbo Zhang, Yichi Zhang, Lanlan Li, Shuqi Wang, Liling Peng, Sisi Peng, Xin Gao,
- Abstract summary: This dataset includes whole-body 18F-Fluorodeoxyglucose (PET/CT) scans and corresponding radiology reports from 490 patients diagnosed with various malignancies.<n>The dataset primarily includes common cancers such as lung cancer, liver cancer, breast cancer, prostate cancer, and ovarian cancer.<n>It is designed to support research in medical imaging, radiomics, artificial intelligence, and multi-modal learning.
- Score: 18.20745555769851
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Publicly available, large-scale medical imaging datasets are crucial for developing and validating artificial intelligence models and conducting retrospective clinical research. However, datasets that combine functional and anatomical imaging with detailed clinical reports across multiple cancer types remain scarce. Here, we present PETWB-REP, a curated dataset comprising whole-body 18F-Fluorodeoxyglucose (FDG) Positron Emission Tomography/Computed Tomography (PET/CT) scans and corresponding radiology reports from 490 patients diagnosed with various malignancies. The dataset primarily includes common cancers such as lung cancer, liver cancer, breast cancer, prostate cancer, and ovarian cancer. This dataset includes paired PET and CT images, de-identified textual reports, and structured clinical metadata. It is designed to support research in medical imaging, radiomics, artificial intelligence, and multi-modal learning.
Related papers
- Imaging Modalities-Based Classification for Lung Cancer Detection [0.0]
Lung cancer continues to be the predominant cause of cancer-related mortality globally.<n>This review analyzes various approaches, including advanced image processing methods, focusing on their efficacy in interpreting CT scans, chest radiographs, and biological markers.
arXiv Detail & Related papers (2025-09-17T19:18:05Z) - A Multimodal and Multi-centric Head and Neck Cancer Dataset for Segmentation, Diagnosis and Outcome Prediction [5.4735577512942655]
We present a publicly available multimodal dataset for head and neck cancer research.<n>All studies contain co-registered PET/CT scans with varying acquisition protocols.<n>We benchmark three key clinical tasks: automated tumor segmentation, recurrence-free survival prediction, and HPV status classification.
arXiv Detail & Related papers (2025-08-30T05:38:48Z) - Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images [29.523577037519985]
Deep learning models are expected to address problems such as poor image quality, motion artifacts, and complex tumor morphology.<n>We introduce a large-scale PET-CT lung tumor segmentation dataset, termed PCLT20K, which comprises 21,930 pairs of PET-CT images from 605 patients.<n>We propose a cross-modal interactive perception network with Mamba (CIPA) for lung tumor segmentation in PET-CT images.
arXiv Detail & Related papers (2025-03-21T16:04:11Z) - Cancer-Net PCa-Data: An Open-Source Benchmark Dataset for Prostate
Cancer Clinical Decision Support using Synthetic Correlated Diffusion Imaging
Data [75.77035221531261]
Cancer-Net PCa-Data is an open-source benchmark dataset of volumetric CDI$s$ imaging data of PCa patients.
Cancer-Net PCa-Data is the first-ever public dataset of CDI$s$ imaging data for PCa.
arXiv Detail & Related papers (2023-11-20T10:28:52Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for
Multimodal Medical Diagnosis [59.35504779947686]
GPT-4V is OpenAI's newest model for multimodal medical diagnosis.
Our evaluation encompasses 17 human body systems.
GPT-4V demonstrates proficiency in distinguishing between medical image modalities and anatomy.
It faces significant challenges in disease diagnosis and generating comprehensive reports.
arXiv Detail & Related papers (2023-10-15T18:32:27Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z) - ECPC-IDS:A benchmark endometrail cancer PET/CT image dataset for
evaluation of semantic segmentation and detection of hypermetabolic regions [7.420919215687338]
Endometrial cancer is one of the most common tumors in the female reproductive system.
This dataset is the first publicly available dataset of endometrial cancer with a large number of multiple images.
arXiv Detail & Related papers (2023-08-16T12:18:27Z) - A Multi-Institutional Open-Source Benchmark Dataset for Breast Cancer
Clinical Decision Support using Synthetic Correlated Diffusion Imaging Data [82.74877848011798]
Cancer-Net BCa is a multi-institutional open-source benchmark dataset of volumetric CDI$s$ imaging data of breast cancer patients.
Cancer-Net BCa is publicly available as a part of a global open-source initiative dedicated to accelerating advancement in machine learning to aid clinicians in the fight against cancer.
arXiv Detail & Related papers (2023-04-12T05:41:44Z) - Predicting Distant Metastases in Soft-Tissue Sarcomas from PET-CT scans
using Constrained Hierarchical Multi-Modality Feature Learning [14.60163613315816]
Distant metastases (DM) are the leading cause of death in patients with soft-tissue sarcomas (STSs)
It is difficult to determine from imaging studies which STS patients will develop metastases.
We outline a new 3D CNN to help predict DM in patients from PET-CT data.
arXiv Detail & Related papers (2021-04-23T05:12:02Z) - Topological Data Analysis of copy number alterations in cancer [70.85487611525896]
We explore the potential to capture information contained in cancer genomic information using a novel topology-based approach.
We find that this technique has the potential to extract meaningful low-dimensional representations in cancer somatic genetic data.
arXiv Detail & Related papers (2020-11-22T17:31:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.