Related papers: SAM-Swin: SAM-Driven Dual-Swin Transformers with Adaptive Lesion Enhancement for Laryngo-Pharyngeal Tumor Detection

SAM-Swin: SAM-Driven Dual-Swin Transformers with Adaptive Lesion Enhancement for Laryngo-Pharyngeal Tumor Detection

URL: http://arxiv.org/abs/2410.21813v1
Date: Tue, 29 Oct 2024 07:32:57 GMT
Title: SAM-Swin: SAM-Driven Dual-Swin Transformers with Adaptive Lesion Enhancement for Laryngo-Pharyngeal Tumor Detection
Authors: Jia Wei, Yun Li, Xiaomao Fan, Wenjun Ma, Meiyu Qiu, Hongyu Chen, Wenbin Lei,
Abstract summary: Laryngo-pharyngeal cancer (LPC) is a highly lethal malignancy in the head and neck region. Recent advancements in tumor detection have significantly improved diagnostic accuracy by integrating global and local feature extraction. We propose SAM-Swin, an innovative SAM-driven Dual-Swin Transformer for laryngo-pharyngeal tumor detection.
Score: 12.86763797167925
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Laryngo-pharyngeal cancer (LPC) is a highly lethal malignancy in the head and neck region. Recent advancements in tumor detection, particularly through dual-branch network architectures, have significantly improved diagnostic accuracy by integrating global and local feature extraction. However, challenges remain in accurately localizing lesions and fully capitalizing on the complementary nature of features within these branches. To address these issues, we propose SAM-Swin, an innovative SAM-driven Dual-Swin Transformer for laryngo-pharyngeal tumor detection. This model leverages the robust segmentation capabilities of the Segment Anything Model 2 (SAM2) to achieve precise lesion segmentation. Meanwhile, we present a multi-scale lesion-aware enhancement module (MS-LAEM) designed to adaptively enhance the learning of nuanced complementary features across various scales, improving the quality of feature extraction and representation. Furthermore, we implement a multi-scale class-aware guidance (CAG) loss that delivers multi-scale targeted supervision, thereby enhancing the model's capacity to extract class-specific features. To validate our approach, we compiled three LPC datasets from the First Affiliated Hospital (FAHSYSU), the Sixth Affiliated Hospital (SAHSYSU) of Sun Yat-sen University, and Nanfang Hospital of Southern Medical University (NHSMU). The FAHSYSU dataset is utilized for internal training, while the SAHSYSU and NHSMU datasets serve for external evaluation. Extensive experiments demonstrate that SAM-Swin outperforms state-of-the-art methods, showcasing its potential for advancing LPC detection and improving patient outcomes. The source code of SAM-Swin is available at the URL of \href{https://github.com/VVJia/SAM-Swin}{https://github.com/VVJia/SAM-Swin}.

Related papers

VesSAM: Efficient Multi-Prompting for Segmenting Complex Vessel [68.24765319399286]
We present VesSAM, a powerful and efficient framework tailored for 2D vessel segmentation.<n>VesSAM integrates (1) a convolutional adapter to enhance local texture features, (2) a multi-prompt encoder that fuses anatomical prompts, and (3) a lightweight mask decoder to reduce jagged artifacts.<n>VesSAM consistently outperforms state-of-the-art PEFT-based SAM variants by over 10% Dice and 13% IoU.
arXiv Detail & Related papers (2025-11-02T15:47:05Z)
SSS: Semi-Supervised SAM-2 with Efficient Prompting for Medical Imaging Segmentation [18.41555492374031]
SSS (Semi-Supervised SAM-2) is a novel approach that leverages SAM-2's robust feature extraction capabilities to uncover latent knowledge in unlabeled medical images.<n>In experiments, SSS achieves an average Dice score of 53.15 on BHSD, surpassing the previous state-of-the-art method by +3.65 Dice.
arXiv Detail & Related papers (2025-06-10T16:09:40Z)
SAMba-UNet: Synergizing SAM2 and Mamba in UNet with Heterogeneous Aggregation for Cardiac MRI Segmentation [6.451534509235736]
This study proposes an innovative dual-encoder architecture named SAMba-UNet.<n>The framework achieves cross-modal feature collaborative learning by integrating the vision foundation model SAM2, the state-space model Mamba, and the classical UNet.<n> Experiments on the ACDC cardiac MRI dataset demonstrate that the proposed model achieves a Dice coefficient of 0.9103 and an HD95 boundary error of 1.0859 mm.
arXiv Detail & Related papers (2025-05-22T06:57:03Z)
BiSeg-SAM: Weakly-Supervised Post-Processing Framework for Boosting Binary Segmentation in Segment Anything Models [6.74659948545092]
BiSeg-SAM is a weakly supervised prompting and boundary refinement network for the segmentation of polyps and skin lesions. Our method demonstrates significant superiority over state-of-the-art (SOTA) methods when tested on five polyp datasets and one skin cancer dataset.
arXiv Detail & Related papers (2025-04-02T08:04:37Z)
Learnable Prompting SAM-induced Knowledge Distillation for Semi-supervised Medical Image Segmentation [47.789013598970925]
We propose a learnable prompting SAM-induced Knowledge distillation framework (KnowSAM) for semi-supervised medical image segmentation. Our model outperforms the state-of-the-art semi-supervised segmentation approaches.
arXiv Detail & Related papers (2024-12-18T11:19:23Z)
SAMPa: Sharpness-aware Minimization Parallelized [51.668052890249726]
Sharpness-aware (SAM) has been shown to improve the generalization of neural networks. Each SAM update requires emphsequentially computing two gradients, effectively doubling the per-iteration cost. We propose a simple modification of SAM, termed SAMPa, which allows us to fully parallelize the two gradient computations.
arXiv Detail & Related papers (2024-10-14T16:21:23Z)
Prototype Learning Guided Hybrid Network for Breast Tumor Segmentation in DCE-MRI [58.809276442508256]
We propose a hybrid network via the combination of convolution neural network (CNN) and transformer layers. The experimental results on private and public DCE-MRI datasets demonstrate that the proposed hybrid network superior performance than the state-of-the-art methods.
arXiv Detail & Related papers (2024-08-11T15:46:00Z)
SAM-FNet: SAM-Guided Fusion Network for Laryngo-Pharyngeal Tumor Detection [11.90977635214196]
We propose a novel SAM-guided fusion network (SAM-FNet) for laryngo-pharyngeal tumor detection. By leveraging the powerful object segmentation capabilities of the Segment Anything Model (SAM), we introduce the SAM into the SAM-FNet to accurately segment the lesion region. Furthermore, we propose a GAN-like feature optimization (GFO) module to capture the discriminative features between the global and local branches.
arXiv Detail & Related papers (2024-08-10T04:14:14Z)
De-LightSAM: Modality-Decoupled Lightweight SAM for Generalizable Medical Segmentation [28.16884929151585]
We propose a modality-decoupled lightweight SAM for domain-generalized medical image segmentation.<n> Specifically, we first devise a lightweight domain-controllable image encoder (DC-Encoder) that produces discriminative visual features for diverse modalities.<n>Finally, we design the query-decoupled modality decoder (QM-Decoder) that leverages a one-to-one strategy to provide an independent decoding channel.
arXiv Detail & Related papers (2024-07-19T09:32:30Z)
ASPS: Augmented Segment Anything Model for Polyp Segmentation [77.25557224490075]
The Segment Anything Model (SAM) has introduced unprecedented potential for polyp segmentation. SAM's Transformer-based structure prioritizes global and low-frequency information. CFA integrates a trainable CNN encoder branch with a frozen ViT encoder, enabling the integration of domain-specific knowledge.
arXiv Detail & Related papers (2024-06-30T14:55:32Z)
Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding [15.401507589312702]
This paper introduces H-SAM, a prompt-free adaptation of the Segment Anything Model (SAM) for efficient fine-tuning of medical images. In the initial stage, H-SAM employs SAM's original decoder to generate a prior probabilistic mask, guiding a more intricate decoding process. Our H-SAM demonstrates a 4.78% improvement in average Dice compared to existing prompt-free SAM variants.
arXiv Detail & Related papers (2024-03-27T05:55:16Z)
SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion Classification Using 3D Multi-Phase Imaging [59.78761085714715]
This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework for liver lesion classification. The proposed framework has been validated through comprehensive experiments on two clinical datasets. To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public.
arXiv Detail & Related papers (2024-02-27T06:32:56Z)
Cheap Lunch for Medical Image Segmentation by Fine-tuning SAM on Few Exemplars [19.725817146049707]
The Segment Anything Model (SAM) has demonstrated remarkable capabilities of scaled-up segmentation models. However, the adoption of foundational models in the medical domain presents a challenge due to the difficulty and expense of labeling sufficient data. This paper introduces an efficient and practical approach for fine-tuning SAM using a limited number of exemplars.
arXiv Detail & Related papers (2023-08-27T15:21:25Z)
SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation [65.52097667738884]
We introduce SurgicalSAM, a novel end-to-end efficient-tuning approach for SAM to integrate surgical-specific information with SAM's pre-trained knowledge for improved generalisation. Specifically, we propose a lightweight prototype-based class prompt encoder for tuning, which directly generates prompt embeddings from class prototypes. In addition, to address the low inter-class variance among surgical instrument categories, we propose contrastive prototype learning.
arXiv Detail & Related papers (2023-08-17T02:51:01Z)
3DSAM-adapter: Holistic adaptation of SAM from 2D to 3D for promptable tumor segmentation [52.699139151447945]
We propose a novel adaptation method for transferring the segment anything model (SAM) from 2D to 3D for promptable medical image segmentation. Our model can outperform domain state-of-the-art medical image segmentation models on 3 out of 4 tasks, specifically by 8.25%, 29.87%, and 10.11% for kidney tumor, pancreas tumor, colon cancer segmentation, and achieve similar performance for liver tumor segmentation.
arXiv Detail & Related papers (2023-06-23T12:09:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.