Multimodal Fusion at Three Tiers: Physics-Driven Data Generation and Vision-Language Guidance for Brain Tumor Segmentation
- URL: http://arxiv.org/abs/2507.09966v3
- Date: Sun, 19 Oct 2025 06:02:17 GMT
- Title: Multimodal Fusion at Three Tiers: Physics-Driven Data Generation and Vision-Language Guidance for Brain Tumor Segmentation
- Authors: Mingda Zhang,
- Abstract summary: This paper proposes a three-tier fusion architecture that achieves precise brain tumor segmentation.<n>The method processes information progressively at the pixel, feature, and semantic levels.<n>We validated the method on the Brain Tumor (BraTS) 2020, 2021 and 2023 datasets.
- Score: 8.695435245976482
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate brain tumor segmentation is crucial for neuro-oncology diagnosis and treatment planning. Deep learning methods have made significant progress, but automatic segmentation still faces challenges, including tumor morphological heterogeneity and complex three-dimensional spatial relationships. This paper proposes a three-tier fusion architecture that achieves precise brain tumor segmentation. The method processes information progressively at the pixel, feature, and semantic levels. At the pixel level, physical modeling extends magnetic resonance imaging (MRI) to multimodal data, including simulated ultrasound and synthetic computed tomography (CT). At the feature level, the method performs Transformer-based cross-modal feature fusion through multi-teacher collaborative distillation, integrating three expert teachers (MRI, US, CT). At the semantic level, clinical textual knowledge generated by GPT-4V is transformed into spatial guidance signals using CLIP contrastive learning and Feature-wise Linear Modulation (FiLM). These three tiers together form a complete processing chain from data augmentation to feature extraction to semantic guidance. We validated the method on the Brain Tumor Segmentation (BraTS) 2020, 2021, and 2023 datasets. The model achieves average Dice coefficients of 0.8665, 0.9014, and 0.8912 on the three datasets, respectively, and reduces the 95% Hausdorff Distance (HD95) by an average of 6.57 millimeters compared with the baseline. This method provides a new paradigm for precise tumor segmentation and boundary localization.
Related papers
- Multimodal Visual Surrogate Compression for Alzheimer's Disease Classification [69.87877580725768]
Multimodal Visual Surrogate Compression (MVSC) learns to compress and adapt large 3D sMRI volumes into compact 2D features.<n>MVSC has two key components: a Volume Context that captures global cross-slice context under textual guidance, and an Adaptive Slice Fusion module that aggregates slice-level information in a text-enhanced, patch-wise manner.
arXiv Detail & Related papers (2026-01-29T13:05:46Z) - DRBD-Mamba for Robust and Efficient Brain Tumor Segmentation with Analytical Insights [54.87947751720332]
Accurate brain tumor segmentation is significant for clinical diagnosis and treatment.<n>Mamba-based State Space Models have demonstrated promising performance.<n>We propose a dual-resolution bi-directional Mamba that captures multi-scale long-range dependencies with minimal computational overhead.
arXiv Detail & Related papers (2025-10-16T07:31:21Z) - Unified Multimodal Coherent Field: Synchronous Semantic-Spatial-Vision Fusion for Brain Tumor Segmentation [13.108816349958659]
We propose the Unified Multimodal Coherent Field (UMCF) method for brain tumor segmentation.<n>Method achieves synchronous interactive fusion of visual, semantic, and spatial information within a unified 3D space.<n>On Brain Tumor (BraTS) datasets, UMCF+Unn-Net achieves average Dice of 0.8579 and 0.8977 respectively, with an average 4.18% improvement across mainstream architectures.
arXiv Detail & Related papers (2025-09-22T08:45:39Z) - Graph-based Multi-Modal Interaction Lightweight Network for Brain Tumor Segmentation (GMLN-BTS) in Edge Iterative MRI Lesion Localization System (EdgeIMLocSys) [6.451534509235736]
We propose the Edge Iterative MRI Lesion Localization System (EdgeIMLocSys), which integrates Continuous Learning from Human Feedback.<n>Central to this system is the Graph-based Multi-Modal Interaction Lightweight Network for Brain Tumor (GMLN-BTS)<n>Our proposed GMLN-BTS model achieves a Dice score of 85.1% on the BraTS 2017 dataset with only 4.58 million parameters, representing a 98% reduction compared to mainstream 3D Transformer models.
arXiv Detail & Related papers (2025-07-14T07:29:49Z) - A Multi-Modal Fusion Framework for Brain Tumor Segmentation Based on 3D Spatial-Language-Vision Integration and Bidirectional Interactive Attention Mechanism [6.589206192038366]
The framework was evaluated on BraTS 2020 dataset comprising 369 multi-institutional MRI scans.<n>The proposed method achieved average Dice coefficient of 0.8505 and 95% Hausdorff distance of 2.8256mm across enhancing tumor, tumor core, and whole tumor regions.
arXiv Detail & Related papers (2025-07-11T13:21:56Z) - Ensemble Learning and 3D Pix2Pix for Comprehensive Brain Tumor Analysis in Multimodal MRI [2.104687387907779]
This study presents an integrated approach leveraging the strengths of ensemble learning with hybrid transformer models and convolutional neural networks (CNNs)<n>Our methodology combines robust tumor segmentation capabilities, utilizing axial attention and transformer encoders for enhanced spatial relationship modeling.<n>The results demonstrate outstanding performance, evidenced by quantitative evaluations such as the Dice Similarity Coefficient (DSC), Hausdorff Distance (HD95) for segmentation, and Structural Similarity Index Measure (SSIM), Peak Signal-to-Noise Ratio (PSNR), and Mean-Square Error (MSE) for inpainting.
arXiv Detail & Related papers (2024-12-16T15:10:53Z) - 3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models [51.855377054763345]
This paper introduces 3D-CT-GPT, a Visual Question Answering (VQA)-based medical visual language model for generating radiology reports from 3D CT scans.
Experiments on both public and private datasets demonstrate that 3D-CT-GPT significantly outperforms existing methods in terms of report accuracy and quality.
arXiv Detail & Related papers (2024-09-28T12:31:07Z) - Learning Brain Tumor Representation in 3D High-Resolution MR Images via Interpretable State Space Models [42.55786269051626]
We propose a novel state-space-model (SSM)-based masked autoencoder which scales ViT-like models to handle high-resolution data effectively.
We propose a latent-to-spatial mapping technique that enables direct visualization of how latent features correspond to specific regions in the input volumes.
Our results highlight the potential of SSM-based self-supervised learning to transform radiomics analysis by combining efficiency and interpretability.
arXiv Detail & Related papers (2024-09-12T04:36:50Z) - Brain Tumor Segmentation in MRI Images with 3D U-Net and Contextual Transformer [0.5033155053523042]
This research presents an enhanced approach for precise segmentation of brain tumor masses in magnetic resonance imaging (MRI) using an advanced 3D-UNet model combined with a Context Transformer (CoT)
The proposed model synchronizes tumor mass characteristics from CoT, mutually reinforcing feature extraction, facilitating the precise capture of detailed tumor mass structures.
Several experimental results present the outstanding segmentation performance of the proposed method in comparison to current state-of-the-art approaches.
arXiv Detail & Related papers (2024-07-11T13:04:20Z) - SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion
Classification Using 3D Multi-Phase Imaging [59.78761085714715]
This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework for liver lesion classification.
The proposed framework has been validated through comprehensive experiments on two clinical datasets.
To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public.
arXiv Detail & Related papers (2024-02-27T06:32:56Z) - An Optimization Framework for Processing and Transfer Learning for the
Brain Tumor Segmentation [2.0886519175557368]
We have constructed an optimization framework based on a 3D U-Net model for brain tumor segmentation.
This framework incorporates a range of techniques, including various pre-processing and post-processing techniques, and transfer learning.
On the validation datasets, this multi-modality brain tumor segmentation framework achieves an average lesion-wise Dice score of 0.79, 0.72, 0.74 on Challenges 1, 2, 3 respectively.
arXiv Detail & Related papers (2024-02-10T18:03:15Z) - Cross-modality Guidance-aided Multi-modal Learning with Dual Attention
for MRI Brain Tumor Grading [47.50733518140625]
Brain tumor represents one of the most fatal cancers around the world, and is very common in children and the elderly.
We propose a novel cross-modality guidance-aided multi-modal learning with dual attention for addressing the task of MRI brain tumor grading.
arXiv Detail & Related papers (2024-01-17T07:54:49Z) - Automated Ensemble-Based Segmentation of Adult Brain Tumors: A Novel
Approach Using the BraTS AFRICA Challenge Data [0.0]
We introduce an ensemble method that comprises eleven unique variations based on three core architectures.
Our findings reveal that the ensemble approach, combining different architectures, outperforms single models.
These results underline the potential of tailored deep learning techniques in precisely segmenting brain tumors.
arXiv Detail & Related papers (2023-08-14T15:34:22Z) - Breast Ultrasound Tumor Classification Using a Hybrid Multitask
CNN-Transformer Network [63.845552349914186]
Capturing global contextual information plays a critical role in breast ultrasound (BUS) image classification.
Vision Transformers have an improved capability of capturing global contextual information but may distort the local image patterns due to the tokenization operations.
In this study, we proposed a hybrid multitask deep neural network called Hybrid-MT-ESTAN, designed to perform BUS tumor classification and segmentation.
arXiv Detail & Related papers (2023-08-04T01:19:32Z) - CKD-TransBTS: Clinical Knowledge-Driven Hybrid Transformer with
Modality-Correlated Cross-Attention for Brain Tumor Segmentation [37.39921484146194]
Brain tumor segmentation in magnetic resonance image (MRI) is crucial for brain tumor diagnosis, cancer management and research purposes.
With the great success of the ten-year BraTS challenges, a lot of outstanding BTS models have been proposed to tackle the difficulties of BTS in different technical aspects.
We propose a clinical knowledge-driven brain tumor segmentation model, called CKD-TransBTS.
arXiv Detail & Related papers (2022-07-15T09:35:29Z) - EMT-NET: Efficient multitask network for computer-aided diagnosis of
breast cancer [58.720142291102135]
We propose an efficient and light-weighted learning architecture to classify and segment breast tumors simultaneously.
We incorporate a segmentation task into a tumor classification network, which makes the backbone network learn representations focused on tumor regions.
The accuracy, sensitivity, and specificity of tumor classification is 88.6%, 94.1%, and 85.3%, respectively.
arXiv Detail & Related papers (2022-01-13T05:24:40Z) - Cross-Modality Deep Feature Learning for Brain Tumor Segmentation [158.8192041981564]
This paper proposes a novel cross-modality deep feature learning framework to segment brain tumors from the multi-modality MRI data.
The core idea is to mine rich patterns across the multi-modality data to make up for the insufficient data scale.
Comprehensive experiments are conducted on the BraTS benchmarks, which show that the proposed cross-modality deep feature learning framework can effectively improve the brain tumor segmentation performance.
arXiv Detail & Related papers (2022-01-07T07:46:01Z) - Revisiting 3D Context Modeling with Supervised Pre-training for
Universal Lesion Detection in CT Slices [48.85784310158493]
We propose a Modified Pseudo-3D Feature Pyramid Network (MP3D FPN) to efficiently extract 3D context enhanced 2D features for universal lesion detection in CT slices.
With the novel pre-training method, the proposed MP3D FPN achieves state-of-the-art detection performance on the DeepLesion dataset.
The proposed 3D pre-trained weights can potentially be used to boost the performance of other 3D medical image analysis tasks.
arXiv Detail & Related papers (2020-12-16T07:11:16Z) - Inter-slice Context Residual Learning for 3D Medical Image Segmentation [38.43650000401734]
We propose the 3D context residual network (ConResNet) for the accurate segmentation of 3D medical images.
This model consists of an encoder, a segmentation decoder, and a context residual decoder.
We show that the proposed ConResNet is more accurate than six top-ranking methods in brain tumor segmentation and seven top-ranking methods in pancreas segmentation.
arXiv Detail & Related papers (2020-11-28T16:03:39Z) - Automatic Brain Tumor Segmentation with Scale Attention Network [1.7767466724342065]
Multimodal Brain Tumor Challenge 2020 (BraTS 2020) provides a common platform for comparing different automatic algorithms on multi-parametric Magnetic Resonance Imaging (mpMRI)
We propose a dynamic scale attention mechanism that incorporates low-level details with high-level semantics from feature maps at different scales.
Our framework was trained using the 369 challenge training cases provided by BraTS 2020, and achieved an average Dice Similarity Coefficient (DSC) of 0.8828, 0.8433 and 0.8177, as well as 95% Hausdorff distance (in millimeter) of 5.2176, 17.9697 and 13.4298 on 166 testing cases for whole tumor
arXiv Detail & Related papers (2020-11-06T04:45:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.