Text Embedded Swin-UMamba for DeepLesion Segmentation
- URL: http://arxiv.org/abs/2508.06453v1
- Date: Fri, 08 Aug 2025 16:54:06 GMT
- Title: Text Embedded Swin-UMamba for DeepLesion Segmentation
- Authors: Ruida Cheng, Tejas Sudharshan Mathai, Pritam Mukherjee, Benjamin Hou, Qingqing Zhu, Zhiyong Lu, Matthew McAuliffe, Ronald M. Summers,
- Abstract summary: Integrating large language models (LLMs) into the lesion segmentation workflow offers the potential to combine imaging features with descriptions of lesion characteristics from radiology reports.<n>In this study, we investigate the feasibility of integrating text into the Swin-UMamba architecture for the task of lesion segmentation.
- Score: 6.654483111362868
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Segmentation of lesions on CT enables automatic measurement for clinical assessment of chronic diseases (e.g., lymphoma). Integrating large language models (LLMs) into the lesion segmentation workflow offers the potential to combine imaging features with descriptions of lesion characteristics from the radiology reports. In this study, we investigate the feasibility of integrating text into the Swin-UMamba architecture for the task of lesion segmentation. The publicly available ULS23 DeepLesion dataset was used along with short-form descriptions of the findings from the reports. On the test dataset, a high Dice Score of 82% and low Hausdorff distance of 6.58 (pixels) was obtained for lesion segmentation. The proposed Text-Swin-UMamba model outperformed prior approaches: 37% improvement over the LLM-driven LanGuideMedSeg model (p < 0.001),and surpassed the purely image-based xLSTM-UNet and nnUNet models by 1.74% and 0.22%, respectively. The dataset and code can be accessed at https://github.com/ruida/LLM-Swin-UMamba
Related papers
- Taylor-Series Expanded Kolmogorov-Arnold Network for Medical Imaging Classification [0.0]
This study introduces Kolmogorov-Arnold Networks (KANs) for accurate medical image classification with limited, diverse datasets.<n>The models include SBTAYLOR-KAN, integrating B-splines with Taylor series; SBRBF-KAN, combining B-splines with Radial Basis Functions; and SBWAVELET-KAN, embedding B-splines in Morlet wavelet transforms.<n>The models were evaluated on brain MRI, chest X-rays, tuberculosis X-rays, and skin lesion images without preprocessing.
arXiv Detail & Related papers (2025-09-17T04:33:54Z) - Automatic Fine-grained Segmentation-assisted Report Generation [3.6341072547314037]
We present ASaRG, an extension of the popular LLaVA architecture for report generation.<n>Our approach achieves a +0.89% performance gain in CE F1 score compared to the LLaVA baseline.<n>Our code will be made publicly available at a later date.
arXiv Detail & Related papers (2025-07-22T14:16:20Z) - GANet-Seg: Adversarial Learning for Brain Tumor Segmentation with Hybrid Generative Models [1.0456203870202954]
This work introduces a novel framework for brain tumor segmentation leveraging pre-trained GANs and Unet architectures.<n>By combining a global anomaly detection module with a refined mask generation network, the proposed model accurately identifies tumor-sensitive regions.<n>Multi-modal MRI data and synthetic image augmentation are employed to improve robustness and address the challenge of limited annotated datasets.
arXiv Detail & Related papers (2025-06-26T13:28:09Z) - MRGen: Segmentation Data Engine for Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically important imaging modalities is challenging due to the scarcity of annotated data.<n>This paper investigates leveraging generative models to synthesize data, for training segmentation models for underrepresented modalities.<n>We present MRGen, a data engine for controllable medical image synthesis conditioned on text prompts and segmentation masks.
arXiv Detail & Related papers (2024-12-04T16:34:22Z) - SMILE-UHURA Challenge -- Small Vessel Segmentation at Mesoscopic Scale from Ultra-High Resolution 7T Magnetic Resonance Angiograms [60.35639972035727]
The lack of publicly available annotated datasets has impeded the development of robust, machine learning-driven segmentation algorithms.
The SMILE-UHURA challenge addresses the gap in publicly available annotated datasets by providing an annotated dataset of Time-of-Flight angiography acquired with 7T MRI.
Dice scores reached up to 0.838 $pm$ 0.066 and 0.716 $pm$ 0.125 on the respective datasets, with an average performance of up to 0.804 $pm$ 0.15.
arXiv Detail & Related papers (2024-11-14T17:06:00Z) - Towards a Benchmark for Colorectal Cancer Segmentation in Endorectal Ultrasound Videos: Dataset and Model Development [59.74920439478643]
In this paper, we collect and annotated the first benchmark dataset that covers diverse ERUS scenarios.
Our ERUS-10K dataset comprises 77 videos and 10,000 high-resolution annotated frames.
We introduce a benchmark model for colorectal cancer segmentation, named the Adaptive Sparse-context TRansformer (ASTR)
arXiv Detail & Related papers (2024-08-19T15:04:42Z) - TotalSegmentator MRI: Robust Sequence-independent Segmentation of Multiple Anatomic Structures in MRI [59.86827659781022]
A nnU-Net model (TotalSegmentator) was trained on MRI and segment 80atomic structures.<n>Dice scores were calculated between the predicted segmentations and expert reference standard segmentations to evaluate model performance.<n>Open-source, easy-to-use model allows for automatic, robust segmentation of 80 structures.
arXiv Detail & Related papers (2024-05-29T20:15:54Z) - Quantifying uncertainty in lung cancer segmentation with foundation models applied to mixed-domain datasets [6.712251433139412]
Medical image foundation models have shown the ability to segment organs and tumors with minimal fine-tuning.<n>These models are typically evaluated on task-specific in-distribution (ID) datasets.<n>We introduce a comprehensive set of computationally fast metrics to evaluate the performance of multiple foundation models trained with self-supervised learning (SSL)<n>SMIT produced the highest F1-score (LRAD: 0.60, 5Rater: 0.64) and lowest entropy (LRAD: 0.06, 5Rater: 0.12), indicating higher tumor detection rate and confident segmentations.
arXiv Detail & Related papers (2024-03-19T19:36:48Z) - Reliable Joint Segmentation of Retinal Edema Lesions in OCT Images [55.83984261827332]
In this paper, we propose a novel reliable multi-scale wavelet-enhanced transformer network.
We develop a novel segmentation backbone that integrates a wavelet-enhanced feature extractor network and a multi-scale transformer module.
Our proposed method achieves better segmentation accuracy with a high degree of reliability as compared to other state-of-the-art segmentation approaches.
arXiv Detail & Related papers (2022-12-01T07:32:56Z) - Mediastinal Lymph Node Detection and Segmentation Using Deep Learning [1.7188280334580195]
In clinical practice, computed tomography (CT) and positron emission tomography (PET) imaging detect abnormal lymph nodes (LNs)
Deep convolutional neural networks frequently segment items in medical photographs.
A well-established deep learning technique UNet was modified using bilinear and total generalized variation (TGV) based up strategy to segment and detect mediastinal lymph nodes.
The modified UNet maintains texture discontinuities, selects noisy areas, searches appropriate balance points through backpropagation, and recreates image resolution.
arXiv Detail & Related papers (2022-11-24T02:55:20Z) - MAPPING: Model Average with Post-processing for Stroke Lesion
Segmentation [57.336056469276585]
We present our stroke lesion segmentation model based on nnU-Net framework, and apply it to the Anatomical Tracings of Lesions After Stroke dataset.
Our method took the first place in the 2022 MICCAI ATLAS Challenge with an average Dice score of 0.6667, Lesion-wise F1 score of 0.5643, Simple Lesion Count score of 4.5367, and Volume Difference score of 8804.9102.
arXiv Detail & Related papers (2022-11-11T14:17:04Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - Detect-and-Segment: a Deep Learning Approach to Automate Wound Image
Segmentation [8.354517822940783]
We present a deep learning approach to produce wound segmentation maps with high generalization capabilities.
In our approach, dedicated deep neural networks detected the wound position, isolated the wound from the uninformative background, and computed the wound segmentation map.
arXiv Detail & Related papers (2021-11-02T13:39:13Z) - Modelling brain lesion volume in patches with CNN-based Poisson
Regression [0.0]
In this work, an efficient, computationally inexpensive CNN is implemented to estimate the number of lesion voxels in a predefined patch size from magnetic resonance (MR) images.
The ISLES2015 (SISS) data is used to train and evaluate the model, which by estimating lesion volume from raw features, accurately identified the lesion image with the larger lesion volume for 86% of paired sample patches.
arXiv Detail & Related papers (2020-11-26T21:11:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.