A Multi-Modal Fusion Framework for Brain Tumor Segmentation Based on 3D Spatial-Language-Vision Integration and Bidirectional Interactive Attention Mechanism
- URL: http://arxiv.org/abs/2507.08574v1
- Date: Fri, 11 Jul 2025 13:21:56 GMT
- Title: A Multi-Modal Fusion Framework for Brain Tumor Segmentation Based on 3D Spatial-Language-Vision Integration and Bidirectional Interactive Attention Mechanism
- Authors: Mingda Zhang, Kaiwen Pan,
- Abstract summary: The framework was evaluated on BraTS 2020 dataset comprising 369 multi-institutional MRI scans.<n>The proposed method achieved average Dice coefficient of 0.8505 and 95% Hausdorff distance of 2.8256mm across enhancing tumor, tumor core, and whole tumor regions.
- Score: 6.589206192038366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study aims to develop a novel multi-modal fusion framework for brain tumor segmentation that integrates spatial-language-vision information through bidirectional interactive attention mechanisms to improve segmentation accuracy and boundary delineation. Methods: We propose two core components: Multi-modal Semantic Fusion Adapter (MSFA) integrating 3D MRI data with clinical text descriptions through hierarchical semantic decoupling, and Bidirectional Interactive Visual-semantic Attention (BIVA) enabling iterative information exchange between modalities. The framework was evaluated on BraTS 2020 dataset comprising 369 multi-institutional MRI scans. Results: The proposed method achieved average Dice coefficient of 0.8505 and 95% Hausdorff distance of 2.8256mm across enhancing tumor, tumor core, and whole tumor regions, outperforming state-of-the-art methods including SCAU-Net, CA-Net, and 3D U-Net. Ablation studies confirmed critical contributions of semantic and spatial modules to boundary precision. Conclusion: Multi-modal semantic fusion combined with bidirectional interactive attention significantly enhances brain tumor segmentation performance, establishing new paradigms for integrating clinical knowledge into medical image analysis.
Related papers
- NEARL-CLIP: Interacted Query Adaptation with Orthogonal Regularization for Medical Vision-Language Understanding [51.63264715941068]
textbfNEARL-CLIP (iunderlineNteracted quunderlineEry underlineAdaptation with ounderlineRthogonaunderlineL Regularization) is a novel cross-modality interaction VLM-based framework.
arXiv Detail & Related papers (2025-08-06T05:44:01Z) - RL-U$^2$Net: A Dual-Branch UNet with Reinforcement Learning-Assisted Multimodal Feature Fusion for Accurate 3D Whole-Heart Segmentation [0.624829068285122]
We propose a dual-branch U-Net architecture enhanced by reinforcement learning for feature alignment.<n>The model employs a dual-branch U-shaped network to process CT and MRI patches in parallel, and introduces a novel RL-XAlign module.<n> Experimental results on the publicly available MM-WHS 2017 dataset demonstrate that the proposed RL-U$2$Net outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2025-08-04T16:12:06Z) - Graph-based Multi-Modal Interaction Lightweight Network for Brain Tumor Segmentation (GMLN-BTS) in Edge Iterative MRI Lesion Localization System (EdgeIMLocSys) [6.451534509235736]
We propose the Edge Iterative MRI Lesion Localization System (EdgeIMLocSys), which integrates Continuous Learning from Human Feedback.<n>Central to this system is the Graph-based Multi-Modal Interaction Lightweight Network for Brain Tumor (GMLN-BTS)<n>Our proposed GMLN-BTS model achieves a Dice score of 85.1% on the BraTS 2017 dataset with only 4.58 million parameters, representing a 98% reduction compared to mainstream 3D Transformer models.
arXiv Detail & Related papers (2025-07-14T07:29:49Z) - A Brain Tumor Segmentation Method Based on CLIP and 3D U-Net with Cross-Modal Semantic Guidance and Multi-Level Feature Fusion [7.784274233237623]
This research presents a multi-level fusion architecture that integrates pixel-level, feature-level, and semantic-level information.<n>The proposed model achieves an overall Dice coefficient of 0.8567, representing a 4.8% improvement compared to traditional 3D U-Net.
arXiv Detail & Related papers (2025-07-14T06:32:59Z) - Ensemble Learning and 3D Pix2Pix for Comprehensive Brain Tumor Analysis in Multimodal MRI [2.104687387907779]
This study presents an integrated approach leveraging the strengths of ensemble learning with hybrid transformer models and convolutional neural networks (CNNs)<n>Our methodology combines robust tumor segmentation capabilities, utilizing axial attention and transformer encoders for enhanced spatial relationship modeling.<n>The results demonstrate outstanding performance, evidenced by quantitative evaluations such as the Dice Similarity Coefficient (DSC), Hausdorff Distance (HD95) for segmentation, and Structural Similarity Index Measure (SSIM), Peak Signal-to-Noise Ratio (PSNR), and Mean-Square Error (MSE) for inpainting.
arXiv Detail & Related papers (2024-12-16T15:10:53Z) - multiPI-TransBTS: A Multi-Path Learning Framework for Brain Tumor Image Segmentation Based on Multi-Physical Information [1.7359724605901228]
Brain Tumor distances (BraTS) plays a critical role in clinical diagnosis, treatment planning, and monitoring the progression of brain tumors.
Due to the variability in tumor appearance, size, and intensity across different MRI modalities, automated segmentation remains a challenging task.
We propose a novel Transformer-based framework, multiPI-TransBTS, which integrates multi-physical information to enhance segmentation accuracy.
arXiv Detail & Related papers (2024-09-18T17:35:19Z) - Hybrid Multihead Attentive Unet-3D for Brain Tumor Segmentation [0.0]
Brain tumor segmentation is a critical task in medical image analysis, aiding in the diagnosis and treatment planning of brain tumor patients.
Various deep learning-based techniques have made significant progress in this field, however, they still face limitations in terms of accuracy due to the complex and variable nature of brain tumor morphology.
We propose a novel Hybrid Multihead Attentive U-Net architecture, to address the challenges in accurate brain tumor segmentation.
arXiv Detail & Related papers (2024-05-22T02:46:26Z) - An Interpretable Cross-Attentive Multi-modal MRI Fusion Framework for Schizophrenia Diagnosis [46.58592655409785]
We propose a novel Cross-Attentive Multi-modal Fusion framework (CAMF) to capture both intra-modal and inter-modal relationships between fMRI and sMRI.
Our approach significantly improves classification accuracy, as demonstrated by our evaluations on two extensive multi-modal brain imaging datasets.
The gradient-guided Score-CAM is applied to interpret critical functional networks and brain regions involved in schizophrenia.
arXiv Detail & Related papers (2024-03-29T20:32:30Z) - SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion
Classification Using 3D Multi-Phase Imaging [59.78761085714715]
This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework for liver lesion classification.
The proposed framework has been validated through comprehensive experiments on two clinical datasets.
To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public.
arXiv Detail & Related papers (2024-02-27T06:32:56Z) - Cross-modality Guidance-aided Multi-modal Learning with Dual Attention
for MRI Brain Tumor Grading [47.50733518140625]
Brain tumor represents one of the most fatal cancers around the world, and is very common in children and the elderly.
We propose a novel cross-modality guidance-aided multi-modal learning with dual attention for addressing the task of MRI brain tumor grading.
arXiv Detail & Related papers (2024-01-17T07:54:49Z) - Automated Ensemble-Based Segmentation of Adult Brain Tumors: A Novel
Approach Using the BraTS AFRICA Challenge Data [0.0]
We introduce an ensemble method that comprises eleven unique variations based on three core architectures.
Our findings reveal that the ensemble approach, combining different architectures, outperforms single models.
These results underline the potential of tailored deep learning techniques in precisely segmenting brain tumors.
arXiv Detail & Related papers (2023-08-14T15:34:22Z) - Multi-task Paired Masking with Alignment Modeling for Medical
Vision-Language Pre-training [55.56609500764344]
We propose a unified framework based on Multi-task Paired Masking with Alignment (MPMA) to integrate the cross-modal alignment task into the joint image-text reconstruction framework.
We also introduce a Memory-Augmented Cross-Modal Fusion (MA-CMF) module to fully integrate visual information to assist report reconstruction.
arXiv Detail & Related papers (2023-05-13T13:53:48Z) - Multiclass MRI Brain Tumor Segmentation using 3D Attention-based U-Net [0.0]
This paper proposes a 3D attention-based U-Net architecture for multi-region segmentation of brain tumors.
The attention mechanism helps to improve segmentation accuracy by de-emphasizing healthy tissues and accentuating malignant tissues.
arXiv Detail & Related papers (2023-05-10T14:35:07Z) - A unified 3D framework for Organs at Risk Localization and Segmentation
for Radiation Therapy Planning [56.52933974838905]
Current medical workflow requires manual delineation of organs-at-risk (OAR)
In this work, we aim to introduce a unified 3D pipeline for OAR localization-segmentation.
Our proposed framework fully enables the exploitation of 3D context information inherent in medical imaging.
arXiv Detail & Related papers (2022-03-01T17:08:41Z) - Cross-Modality Deep Feature Learning for Brain Tumor Segmentation [158.8192041981564]
This paper proposes a novel cross-modality deep feature learning framework to segment brain tumors from the multi-modality MRI data.
The core idea is to mine rich patterns across the multi-modality data to make up for the insufficient data scale.
Comprehensive experiments are conducted on the BraTS benchmarks, which show that the proposed cross-modality deep feature learning framework can effectively improve the brain tumor segmentation performance.
arXiv Detail & Related papers (2022-01-07T07:46:01Z) - Cross-Modality Brain Tumor Segmentation via Bidirectional
Global-to-Local Unsupervised Domain Adaptation [61.01704175938995]
In this paper, we propose a novel Bidirectional Global-to-Local (BiGL) adaptation framework under a UDA scheme.
Specifically, a bidirectional image synthesis and segmentation module is proposed to segment the brain tumor.
The proposed method outperforms several state-of-the-art unsupervised domain adaptation methods by a large margin.
arXiv Detail & Related papers (2021-05-17T10:11:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.