Vision-Language Modeling in PET/CT for Visual Grounding of Positive Findings
- URL: http://arxiv.org/abs/2502.00528v1
- Date: Sat, 01 Feb 2025 18:59:31 GMT
- Title: Vision-Language Modeling in PET/CT for Visual Grounding of Positive Findings
- Authors: Zachary Huemann, Samuel Church, Joshua D. Warner, Daniel Tran, Xin Tie, Alan B McMillan, Junjie Hu, Steve Y. Cho, Meghan Lubner, Tyler J. Bradshaw,
- Abstract summary: Vision-language models can connect the text description of an object to its specific location in an image through visual grounding.
These models require large annotated image-text datasets, which are lacking for PET/CT.
We developed an automated pipeline to generate weak labels linking PET/CT report descriptions to their image locations and used it to train a 3D vision-language visual grounding model.
- Score: 3.5437215225628576
- License:
- Abstract: Vision-language models can connect the text description of an object to its specific location in an image through visual grounding. This has potential applications in enhanced radiology reporting. However, these models require large annotated image-text datasets, which are lacking for PET/CT. We developed an automated pipeline to generate weak labels linking PET/CT report descriptions to their image locations and used it to train a 3D vision-language visual grounding model. Our pipeline finds positive findings in PET/CT reports by identifying mentions of SUVmax and axial slice numbers. From 25,578 PET/CT exams, we extracted 11,356 sentence-label pairs. Using this data, we trained ConTEXTual Net 3D, which integrates text embeddings from a large language model with a 3D nnU-Net via token-level cross-attention. The model's performance was compared against LLMSeg, a 2.5D version of ConTEXTual Net, and two nuclear medicine physicians. The weak-labeling pipeline accurately identified lesion locations in 98% of cases (246/251), with 7.5% requiring boundary adjustments. ConTEXTual Net 3D achieved an F1 score of 0.80, outperforming LLMSeg (F1=0.22) and the 2.5D model (F1=0.53), though it underperformed both physicians (F1=0.94 and 0.91). The model achieved better performance on FDG (F1=0.78) and DCFPyL (F1=0.75) exams, while performance dropped on DOTATE (F1=0.58) and Fluciclovine (F1=0.66). The model performed consistently across lesion sizes but showed reduced accuracy on lesions with low uptake. Our novel weak labeling pipeline accurately produced an annotated dataset of PET/CT image-text pairs, facilitating the development of 3D visual grounding models. ConTEXTual Net 3D significantly outperformed other models but fell short of the performance of nuclear medicine physicians. Our study suggests that even larger datasets may be needed to close this performance gap.
Related papers
- Efficient 2D CT Foundation Model for Contrast Phase Classification [4.650290073034678]
A 2D foundation model was trained to generate embeddings from 2D CT slices for downstream contrast phase classification.
The model was validated on the VinDr Multiphase dataset and externally validated on the WAW-TACE dataset.
Compared to 3D supervised models, the approach trained faster, performed as well or better, and showed greater robustness to domain shifts.
arXiv Detail & Related papers (2025-01-23T20:01:33Z) - Data Diet: Can Trimming PET/CT Datasets Enhance Lesion Segmentation? [70.38903555729081]
We describe our approach to compete in the autoPET3 datacentric track.
We find that in the autoPETIII dataset, a model that is trained on the entire dataset exhibits undesirable characteristics.
We counteract this by removing the easiest samples from the training dataset as measured by the model loss before retraining from scratch.
arXiv Detail & Related papers (2024-09-20T14:47:58Z) - AutoPET Challenge III: Testing the Robustness of Generalized Dice Focal Loss trained 3D Residual UNet for FDG and PSMA Lesion Segmentation from Whole-Body PET/CT Images [0.0]
In this study, we utilize a 3D Residual UNet model and employ the Generalized Dice Loss function to train the model on the AutoPET Challenge 2024 dataset.
In the preliminary test phase for Task-1, the average ensemble achieved a mean Dice Similarity Coefficient (DSC) of 0.6687, mean false negative volume (FNV) of 10.9522 ml and mean false positive volume (FPV) 2.9684 ml.
arXiv Detail & Related papers (2024-09-16T10:27:30Z) - CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios [53.94122089629544]
We introduce CT-GLIP (Grounded Language-Image Pretraining with CT scans), a novel method that constructs organ-level image-text pairs to enhance multimodal contrastive learning.
Our method, trained on a multimodal CT dataset comprising 44,011 organ-level vision-text pairs from 17,702 patients across 104 organs, demonstrates it can identify organs and abnormalities in a zero-shot manner using natural languages.
arXiv Detail & Related papers (2024-04-23T17:59:01Z) - BeSt-LeS: Benchmarking Stroke Lesion Segmentation using Deep Supervision [0.0]
We consider the publicly available dataset ATLAS $v2.0$ to benchmark various end-to-end supervised U-Net style models.
Specifically, we have benchmarked models on both 2D and 3D brain images and evaluated them using standard metrics.
arXiv Detail & Related papers (2023-10-10T22:54:01Z) - Video Pretraining Advances 3D Deep Learning on Chest CT Tasks [63.879848037679224]
Pretraining on large natural image classification datasets has aided model development on data-scarce 2D medical tasks.
These 2D models have been surpassed by 3D models on 3D computer vision benchmarks.
We show video pretraining for 3D models can enable higher performance on smaller datasets for 3D medical tasks.
arXiv Detail & Related papers (2023-04-02T14:46:58Z) - 3D-GMIC: an efficient deep neural network to find small objects in large
3D images [41.334361182700164]
3D imaging enables a more accurate diagnosis by providing spatial information about organ anatomy.
Using 3D images to train AI models is computationally challenging because they consist of tens or hundreds of times more pixels than their 2D counterparts.
We propose a novel neural network architecture that enables computationally efficient classification of 3D medical images in their full resolution.
arXiv Detail & Related papers (2022-10-16T21:58:54Z) - 3D RegNet: Deep Learning Model for COVID-19 Diagnosis on Chest CT Image [9.407002591446286]
A 3D-RegNet-based neural network is proposed for diagnosing the physical condition of patients with coronavirus (Covid-19) infection.
The results show that the test set of the 3D model, the result: f1 score of 0.8379 and AUC value of 0.8807 have been achieved.
arXiv Detail & Related papers (2021-07-08T18:10:07Z) - Automated Model Design and Benchmarking of 3D Deep Learning Models for
COVID-19 Detection with Chest CT Scans [72.04652116817238]
We propose a differentiable neural architecture search (DNAS) framework to automatically search for the 3D DL models for 3D chest CT scans classification.
We also exploit the Class Activation Mapping (CAM) technique on our models to provide the interpretability of the results.
arXiv Detail & Related papers (2021-01-14T03:45:01Z) - Revisiting 3D Context Modeling with Supervised Pre-training for
Universal Lesion Detection in CT Slices [48.85784310158493]
We propose a Modified Pseudo-3D Feature Pyramid Network (MP3D FPN) to efficiently extract 3D context enhanced 2D features for universal lesion detection in CT slices.
With the novel pre-training method, the proposed MP3D FPN achieves state-of-the-art detection performance on the DeepLesion dataset.
The proposed 3D pre-trained weights can potentially be used to boost the performance of other 3D medical image analysis tasks.
arXiv Detail & Related papers (2020-12-16T07:11:16Z) - PerMO: Perceiving More at Once from a Single Image for Autonomous
Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image.
Our approach combines the strengths of deep learning and the elegance of traditional techniques.
We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.