Vision-Language Models for Automated 3D PET/CT Report Generation
- URL: http://arxiv.org/abs/2511.20145v1
- Date: Tue, 25 Nov 2025 10:07:57 GMT
- Title: Vision-Language Models for Automated 3D PET/CT Report Generation
- Authors: Wenpei Jiao, Kun Shang, Hui Li, Ke Yan, Jiajin Zhang, Guangjie Yang, Lijuan Guo, Yan Wan, Xing Yang, Dakai Jin, Zhaoheng Xie,
- Abstract summary: AutoPET/CT report generation is increasingly important for reducing clinical workload.<n>PETRG-3D is an end-to-end 3D dual-branch framework that encodes PET and CT volumes and incorporates style-adaptive prompts.<n>PETRG-Score is a lymphoma-specific evaluation protocol that measures metabolic and structural findings across curated anatomical regions.
- Score: 13.781844347232079
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Positron emission tomography/computed tomography (PET/CT) is essential in oncology, yet the rapid expansion of scanners has outpaced the availability of trained specialists, making automated PET/CT report generation (PETRG) increasingly important for reducing clinical workload. Compared with structural imaging (e.g., X-ray, CT, and MRI), functional PET poses distinct challenges: metabolic patterns vary with tracer physiology, and whole-body 3D contextual information is required rather than local-region interpretation. To advance PETRG, we propose PETRG-3D, an end-to-end 3D dual-branch framework that separately encodes PET and CT volumes and incorporates style-adaptive prompts to mitigate inter-hospital variability in reporting practices. We construct PETRG-Lym, a multi-center lymphoma dataset collected from four hospitals (824 reports w/ 245,509 paired PET/CT slices), and construct AutoPET-RG-Lym, a publicly accessible PETRG benchmark derived from open imaging data but equipped with new expert-written, clinically validated reports (135 cases). To assess clinical utility, we introduce PETRG-Score, a lymphoma-specific evaluation protocol that jointly measures metabolic and structural findings across curated anatomical regions. Experiments show that PETRG-3D substantially outperforms existing methods on both natural language metrics (e.g., +31.49\% ROUGE-L) and clinical efficacy metrics (e.g., +8.18\% PET-All), highlighting the benefits of volumetric dual-modality modeling and style-aware prompting. Overall, this work establishes a foundation for future PET/CT-specific models emphasizing disease-aware reasoning and clinically reliable evaluation. Codes, models, and AutoPET-RG-Lym will be released.
Related papers
- GazeXPErT: An Expert Eye-tracking Dataset for Interpretable and Explainable AI in Oncologic FDG-PET/CT Scans [32.135469601806754]
We present GazeXPErT, a 4D eye-tracking dataset capturing expert search patterns during tumor detection and measurement.<n>From 3,948 minutes of raw 60Hz eye-tracking data, 9,030 unique gaze-to-lesion trajectories were extracted, synchronized with PET/CT image slices, and rendered in COCO-style format for machine learning applications.
arXiv Detail & Related papers (2026-02-26T04:39:15Z) - Unveiling and Bridging the Functional Perception Gap in MLLMs: Atomic Visual Alignment and Hierarchical Evaluation via PET-Bench [48.60251555171943]
Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in tasks such as abnormality detection and report generation for anatomical modalities.<n>In this work, we quantify a fundamental functional perception gap: the inability of current vision encoders to decode functional tracer biodistribution independent of morphological priors.<n>We introduce PET-Bench, the first large-scale functional imaging benchmark comprising 52,308 hierarchical QA pairs from 9,732 multi-site, multi-tracer PET studies.<n>Our results demonstrate that AVA effectively bridges the perception gap, transforming CoT from a source of hallucination into a robust inference tool and improving diagnostic
arXiv Detail & Related papers (2026-01-06T05:58:50Z) - PETAR: Localized Findings Generation with Mask-Aware Vision-Language Modeling for PET Automated Reporting [10.800541081358132]
We introduce a large-scale dataset comprising over 11,000 lesion-level descriptions paired with 3D segmentations from more than 5,000 PET/CT exams.<n>Building upon this dataset, we propose PETAR-4B, a 3D mask-aware vision-language model that integrates PET, CT, and lesion contours for spatially grounded report generation.
arXiv Detail & Related papers (2025-10-31T17:49:01Z) - TauGenNet: Plasma-Driven Tau PET Image Synthesis via Text-Guided 3D Diffusion Models [6.674230585698143]
We propose a text-guided 3D diffusion model for 3D tau PET image synthesis, leveraging multimodal conditions from both structural MRI and plasma measurement.<n> Experimental results demonstrate that our approach can generate realistic, clinically meaningful 3D tau PET across a range of disease stages.
arXiv Detail & Related papers (2025-09-04T14:45:50Z) - Supervised Diffusion-Model-Based PET Image Reconstruction [44.89560992517543]
Diffusion models (DMs) have been introduced as a regularizing prior for PET image reconstruction.<n>We propose a supervised DM-based algorithm for PET reconstruction.<n>Our method enforces the non-negativity of PET's Poisson likelihood model and accommodates the wide intensity range of PET images.
arXiv Detail & Related papers (2025-06-30T16:39:50Z) - Personalized MR-Informed Diffusion Models for 3D PET Image Reconstruction [40.722159771726375]
We propose a simple method for generating subject-specific PET images from a dataset of PET-MR scans.<n>The images we synthesize retain information from the subject's MR scan, leading to higher resolution and the retention of anatomical features.<n>With simulated and real [$18$F]FDG datasets, we show that pre-training a personalized diffusion model with subject-specific "pseudo-PET" images improves reconstruction accuracy with low-count data.
arXiv Detail & Related papers (2025-06-04T10:24:14Z) - Developing a PET/CT Foundation Model for Cross-Modal Anatomical and Functional Imaging [39.59895695500171]
We introduce the Cross-Fraternal Twin Masked Autoencoder (FratMAE), a novel framework that effectively integrates whole-body anatomical and functional information.<n>FratMAE captures intricate cross-modal relationships and global uptake patterns, achieving superior performance on downstream tasks.
arXiv Detail & Related papers (2025-03-04T17:49:07Z) - End-to-end Triple-domain PET Enhancement: A Hybrid Denoising-and-reconstruction Framework for Reconstructing Standard-dose PET Images from Low-dose PET Sinograms [43.13562515963306]
We propose an end-to-end TriPle-domain LPET EnhancemenT (TriPLET) framework to reconstruct standard-dose PET images from low-dose PET sinograms.<n>Our proposed TriPLET can reconstruct SPET images with the highest similarity and signal-to-noise ratio to real data, compared with state-of-the-art methods.
arXiv Detail & Related papers (2024-12-04T14:47:27Z) - 3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models [51.855377054763345]
This paper introduces 3D-CT-GPT, a Visual Question Answering (VQA)-based medical visual language model for generating radiology reports from 3D CT scans.
Experiments on both public and private datasets demonstrate that 3D-CT-GPT significantly outperforms existing methods in terms of report accuracy and quality.
arXiv Detail & Related papers (2024-09-28T12:31:07Z) - Autopet III challenge: Incorporating anatomical knowledge into nnUNet for lesion segmentation in PET/CT [4.376648893167674]
The autoPET III Challenge focuses on advancing automated segmentation of tumor lesions in PET/CT images.
We developed a classifier that identifies the tracer of the given PET/CT based on the Maximum Intensity Projection of the PET scan.
Our final submission achieves cross-validation Dice scores of 76.90% and 61.33% for the publicly available FDG and PSMA datasets.
arXiv Detail & Related papers (2024-09-18T17:16:57Z) - From FDG to PSMA: A Hitchhiker's Guide to Multitracer, Multicenter Lesion Segmentation in PET/CT Imaging [0.9384264274298444]
We present our solution for the autoPET III challenge, targeting multitracer, multicenter generalization using the nnU-Net framework with the ResEncL architecture.
Key techniques include misalignment data augmentation and multi-modal pretraining across CT, MR, and PET datasets.
Compared to the default nnU-Net, which achieved a Dice score of 57.61, our model significantly improved performance with a Dice score of 68.40, alongside a reduction in false positive (FPvol: 7.82) and false negative (FNvol: 10.35) volumes.
arXiv Detail & Related papers (2024-09-14T16:39:17Z) - Contrastive Diffusion Model with Auxiliary Guidance for Coarse-to-Fine
PET Reconstruction [62.29541106695824]
This paper presents a coarse-to-fine PET reconstruction framework that consists of a coarse prediction module (CPM) and an iterative refinement module (IRM)
By delegating most of the computational overhead to the CPM, the overall sampling speed of our method can be significantly improved.
Two additional strategies, i.e., an auxiliary guidance strategy and a contrastive diffusion strategy, are proposed and integrated into the reconstruction process.
arXiv Detail & Related papers (2023-08-20T04:10:36Z) - Exploring Vanilla U-Net for Lesion Segmentation from Whole-body
FDG-PET/CT Scans [16.93163630413171]
Since FDG-PET scans only provide metabolic information, healthy tissue or benign disease with irregular glucose consumption may be mistaken for cancer.
In this paper, we explore the potential of U-Net for lesion segmentation in whole-body FDG-PET/CT scans from three aspects, including network architecture, data preprocessing, and data augmentation.
Our method achieves first place in both preliminary and final leaderboards of the autoPET 2022 challenge.
arXiv Detail & Related papers (2022-10-14T03:37:18Z) - A unified 3D framework for Organs at Risk Localization and Segmentation
for Radiation Therapy Planning [56.52933974838905]
Current medical workflow requires manual delineation of organs-at-risk (OAR)
In this work, we aim to introduce a unified 3D pipeline for OAR localization-segmentation.
Our proposed framework fully enables the exploitation of 3D context information inherent in medical imaging.
arXiv Detail & Related papers (2022-03-01T17:08:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.