TextBraTS: Text-Guided Volumetric Brain Tumor Segmentation with Innovative Dataset Development and Fusion Module Exploration
- URL: http://arxiv.org/abs/2506.16784v2
- Date: Mon, 23 Jun 2025 17:15:40 GMT
- Title: TextBraTS: Text-Guided Volumetric Brain Tumor Segmentation with Innovative Dataset Development and Fusion Module Exploration
- Authors: Xiaoyu Shi, Rahul Kumar Jain, Yinhao Li, Ruibo Hou, Jingliang Cheng, Jie Bai, Guohua Zhao, Lanfen Lin, Rui Xu, Yen-wei Chen,
- Abstract summary: We introduce the TextBraTS dataset, the first publicly available volume-level multimodal dataset that contains paired MRI volumes and rich textual annotations.<n>Our approach demonstrates significant improvements in brain tumor segmentation accuracy, offering valuable insights into effective multimodal integration techniques.
- Score: 17.408476262703086
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning has demonstrated remarkable success in medical image segmentation and computer-aided diagnosis. In particular, numerous advanced methods have achieved state-of-the-art performance in brain tumor segmentation from MRI scans. While recent studies in other medical imaging domains have revealed that integrating textual reports with visual data can enhance segmentation accuracy, the field of brain tumor analysis lacks a comprehensive dataset that combines radiological images with corresponding textual annotations. This limitation has hindered the exploration of multimodal approaches that leverage both imaging and textual data. To bridge this critical gap, we introduce the TextBraTS dataset, the first publicly available volume-level multimodal dataset that contains paired MRI volumes and rich textual annotations, derived from the widely adopted BraTS2020 benchmark. Building upon this novel dataset, we propose a novel baseline framework and sequential cross-attention method for text-guided volumetric medical image segmentation. Through extensive experiments with various text-image fusion strategies and templated text formulations, our approach demonstrates significant improvements in brain tumor segmentation accuracy, offering valuable insights into effective multimodal integration techniques. Our dataset, implementation code, and pre-trained models are publicly available at https://github.com/Jupitern52/TextBraTS.
Related papers
- TextDiffSeg: Text-guided Latent Diffusion Model for 3d Medical Images Segmentation [0.0]
Text-guided diffusion model framework, TextDiffSeg, integrates 3D volumetric data with natural language descriptions.<n>By enhancing the model's ability to recognize complex anatomical structures, TextDiffSeg incorporates innovative label embedding techniques.<n> Experimental results demonstrate that TextDiffSeg consistently outperforms existing methods in segmentation tasks involving kidney and pancreas tumors.
arXiv Detail & Related papers (2025-04-16T07:17:36Z) - Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging [4.341503087761129]
Conducting multimodal learning involves visual and text modalities shown as a solution, but collecting paired vision-language datasets is expensive and time-consuming.<n>Inspired by the superior ability in numerous cross-modal tasks for Large Language Models (LLMs), we proposed a novel Vision-LLM union framework to address the issues.
arXiv Detail & Related papers (2025-04-09T23:33:35Z) - MRGen: Segmentation Data Engine For Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically significant imaging modalities is challenging due to the scarcity of annotated data.<n>This paper investigates leveraging generative models to synthesize training data, to train segmentation models for underrepresented modalities.
arXiv Detail & Related papers (2024-12-04T16:34:22Z) - MedCLIP-SAMv2: Towards Universal Text-Driven Medical Image Segmentation [2.2585213273821716]
We introduce MedCLIP-SAMv2, a novel framework that integrates the CLIP and SAM models to perform segmentation on clinical scans.<n>Our approach includes fine-tuning the BiomedCLIP model with a new Decoupled Hard Negative Noise Contrastive Estimation (DHN-NCE) loss.<n>We also investigate using zero-shot segmentation labels within a weakly supervised paradigm to enhance segmentation quality further.
arXiv Detail & Related papers (2024-09-28T23:10:37Z) - RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis [56.57177181778517]
RadGenome-Chest CT is a large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE.
We leverage the latest powerful universal segmentation and large language models to extend the original datasets.
arXiv Detail & Related papers (2024-04-25T17:11:37Z) - Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features.
We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z) - XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models [72.8965643836841]
We introduce XrayGPT, a novel conversational medical vision-language model.<n>It can analyze and answer open-ended questions about chest radiographs.<n>We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z) - Cross-Modality Deep Feature Learning for Brain Tumor Segmentation [158.8192041981564]
This paper proposes a novel cross-modality deep feature learning framework to segment brain tumors from the multi-modality MRI data.
The core idea is to mine rich patterns across the multi-modality data to make up for the insufficient data scale.
Comprehensive experiments are conducted on the BraTS benchmarks, which show that the proposed cross-modality deep feature learning framework can effectively improve the brain tumor segmentation performance.
arXiv Detail & Related papers (2022-01-07T07:46:01Z) - Robust Multimodal Brain Tumor Segmentation via Feature Disentanglement
and Gated Fusion [71.87627318863612]
We propose a novel multimodal segmentation framework which is robust to the absence of imaging modalities.
Our network uses feature disentanglement to decompose the input modalities into the modality-specific appearance code.
We validate our method on the important yet challenging multimodal brain tumor segmentation task with the BRATS challenge dataset.
arXiv Detail & Related papers (2020-02-22T14:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.