Related papers: Lesion Segmentation in FDG-PET/CT Using Swin Transformer U-Net 3D: A Robust Deep Learning Framework

Lesion Segmentation in FDG-PET/CT Using Swin Transformer U-Net 3D: A Robust Deep Learning Framework

URL: http://arxiv.org/abs/2601.02864v1
Date: Tue, 06 Jan 2026 09:52:00 GMT
Title: Lesion Segmentation in FDG-PET/CT Using Swin Transformer U-Net 3D: A Robust Deep Learning Framework
Authors: Shovini Guha, Dwaipayan Nandi,
Abstract summary: This paper presents a Swin Transformer UNet 3D (SwinUNet3D) framework for lesion segmentation in PET/CT scans.<n>By combining shifted window self-attention with U-Net style skip connections, the model captures both global context and fine anatomical detail.<n>Results show that SwinUNet3D achieves a Dice score of 0.88 and IoU of 0.78, surpassing 3D U-Net (Dice 0.48, IoU 0.32) while also delivering faster inference times.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accurate and automated lesion segmentation in Positron Emission Tomography / Computed Tomography (PET/CT) imaging is essential for cancer diagnosis and therapy planning. This paper presents a Swin Transformer UNet 3D (SwinUNet3D) framework for lesion segmentation in Fluorodeoxyglucose Positron Emission Tomography / Computed Tomography (FDG-PET/CT) scans. By combining shifted window self-attention with U-Net style skip connections, the model captures both global context and fine anatomical detail. We evaluate SwinUNet3D on the AutoPET III FDG dataset and compare it against a baseline 3D U-Net. Results show that SwinUNet3D achieves a Dice score of 0.88 and IoU of 0.78, surpassing 3D U-Net (Dice 0.48, IoU 0.32) while also delivering faster inference times. Qualitative analysis demonstrates improved detection of small and irregular lesions, reduced false positives, and more accurate PET/CT fusion. While the framework is currently limited to FDG scans and trained under modest GPU resources, it establishes a strong foundation for future multi-tracer, multi-center evaluations and benchmarking against other transformer-based architectures. Overall, SwinUNet3D represents an efficient and robust approach to PET/CT lesion segmentation, advancing the integration of transformer-based models into oncology imaging workflows.

Related papers

Personalized MR-Informed Diffusion Models for 3D PET Image Reconstruction [40.722159771726375]
We propose a simple method for generating subject-specific PET images from a dataset of PET-MR scans.<n>The images we synthesize retain information from the subject's MR scan, leading to higher resolution and the retention of anatomical features.<n>With simulated and real [$18$F]FDG datasets, we show that pre-training a personalized diffusion model with subject-specific "pseudo-PET" images improves reconstruction accuracy with low-count data.
arXiv Detail & Related papers (2025-06-04T10:24:14Z)
3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models [51.855377054763345]
This paper introduces 3D-CT-GPT, a Visual Question Answering (VQA)-based medical visual language model for generating radiology reports from 3D CT scans. Experiments on both public and private datasets demonstrate that 3D-CT-GPT significantly outperforms existing methods in terms of report accuracy and quality.
arXiv Detail & Related papers (2024-09-28T12:31:07Z)
AutoPET III Challenge: Tumor Lesion Segmentation using ResEnc-Model Ensemble [1.3467243219009812]
We trained a 3D Residual encoder U-Net within the no new U-Net framework to generalize the performance of automatic lesion segmentation. We leveraged test-time augmentations and other post-processing techniques to enhance tumor lesion segmentation. Our team currently hold the top position in the Auto-PET III challenge and outperformed the challenge baseline model in the preliminary test set with Dice score of 0.9627.
arXiv Detail & Related papers (2024-09-19T20:18:39Z)
From FDG to PSMA: A Hitchhiker's Guide to Multitracer, Multicenter Lesion Segmentation in PET/CT Imaging [0.9384264274298444]
We present our solution for the autoPET III challenge, targeting multitracer, multicenter generalization using the nnU-Net framework with the ResEncL architecture. Key techniques include misalignment data augmentation and multi-modal pretraining across CT, MR, and PET datasets. Compared to the default nnU-Net, which achieved a Dice score of 57.61, our model significantly improved performance with a Dice score of 68.40, alongside a reduction in false positive (FPvol: 7.82) and false negative (FNvol: 10.35) volumes.
arXiv Detail & Related papers (2024-09-14T16:39:17Z)
Diff3Dformer: Leveraging Slice Sequence Diffusion for Enhanced 3D CT Classification with Transformer Networks [5.806035963947936]
We propose a Diffusion-based 3D Vision Transformer (Diff3Dformer) to aggregate repetitive information within 3D CT scans. Our method exhibits improved performance on two different scales of small datasets of 3D lung CT scans.
arXiv Detail & Related papers (2024-06-24T23:23:18Z)
2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction [17.897681480967087]
Positron Emission Tomography (PET) is an important clinical imaging tool but inevitably introduces radiation hazards to patients and healthcare providers. It is desirable to develop 3D methods to translate the non-attenuation-corrected low-dose PET into attenuation-corrected standard-dose PET. Recent diffusion models have emerged as a new state-of-the-art deep learning method for image-to-image translation, better than traditional CNN-based methods. We developed a novel 2.5D Multi-view Averaging Diffusion Model (MADM) for 3D image-to-image translation with application on NAC
arXiv Detail & Related papers (2024-06-12T16:22:41Z)
View-Disentangled Transformer for Brain Lesion Detection [50.4918615815066]
We propose a novel view-disentangled transformer to enhance the extraction of MRI features for more accurate tumour detection. First, the proposed transformer harvests long-range correlation among different positions in a 3D brain scan. Second, the transformer models a stack of slice features as multiple 2D views and enhance these features view-by-view. Third, we deploy the proposed transformer module in a transformer backbone, which can effectively detect the 2D regions surrounding brain lesions.
arXiv Detail & Related papers (2022-09-20T11:58:23Z)
Advancing 3D Medical Image Analysis with Variable Dimension Transform based Supervised 3D Pre-training [45.90045513731704]
This paper revisits an innovative yet simple fully-supervised 3D network pre-training framework. With a redesigned 3D network architecture, reformulated natural images are used to address the problem of data scarcity. Comprehensive experiments on four benchmark datasets demonstrate that the proposed pre-trained models can effectively accelerate convergence.
arXiv Detail & Related papers (2022-01-05T03:11:21Z)
Hepatic vessel segmentation based on 3Dswin-transformer with inductive biased multi-head self-attention [46.46365941681487]
We propose a robust end-to-end vessel segmentation network called Indu BIased Multi-Head Attention Vessel Net. We introduce the voxel-wise embedding rather than patch-wise embedding to locate precise liver vessel voxels. On the other hand, we propose inductive biased multi-head self-attention which learns inductive biased relative positional embedding from absolute position embedding.
arXiv Detail & Related papers (2021-11-05T10:17:08Z)
Automated Model Design and Benchmarking of 3D Deep Learning Models for COVID-19 Detection with Chest CT Scans [72.04652116817238]
We propose a differentiable neural architecture search (DNAS) framework to automatically search for the 3D DL models for 3D chest CT scans classification. We also exploit the Class Activation Mapping (CAM) technique on our models to provide the interpretability of the results.
arXiv Detail & Related papers (2021-01-14T03:45:01Z)
Revisiting 3D Context Modeling with Supervised Pre-training for Universal Lesion Detection in CT Slices [48.85784310158493]
We propose a Modified Pseudo-3D Feature Pyramid Network (MP3D FPN) to efficiently extract 3D context enhanced 2D features for universal lesion detection in CT slices. With the novel pre-training method, the proposed MP3D FPN achieves state-of-the-art detection performance on the DeepLesion dataset. The proposed 3D pre-trained weights can potentially be used to boost the performance of other 3D medical image analysis tasks.
arXiv Detail & Related papers (2020-12-16T07:11:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.