MM-NeuroOnco: A Multimodal Benchmark and Instruction Dataset for MRI-Based Brain Tumor Diagnosis
- URL: http://arxiv.org/abs/2602.22955v1
- Date: Thu, 26 Feb 2026 12:50:32 GMT
- Title: MM-NeuroOnco: A Multimodal Benchmark and Instruction Dataset for MRI-Based Brain Tumor Diagnosis
- Authors: Feng Guo, Jiaxiang Liu, Yang Li, Qianqian Shi, Mingkun Xu,
- Abstract summary: MM-NeuroOnco is a large-scale benchmark and instruction-tuning dataset for brain tumor MRI understanding.<n>We construct MM-NeuroOnco-Bench, a manually annotated evaluation benchmark with a rejection-aware setting.<n>We propose NeuroOnco-GPT, which achieves a 27% absolute accuracy improvement on diagnostic questions following fine-tuning.
- Score: 8.869012393636135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate brain tumor diagnosis requires models to not only detect lesions but also generate clinically interpretable reasoning grounded in imaging manifestations, yet existing public datasets remain limited in annotation richness and diagnostic semantics. To bridge this gap, we introduce MM-NeuroOnco, a large-scale multimodal benchmark and instruction-tuning dataset for brain tumor MRI understanding, consisting of 24,726 MRI slices from 20 data sources paired with approximately 200,000 semantically enriched multimodal instructions spanning diverse tumor subtypes and imaging modalities. To mitigate the scarcity and high cost of diagnostic semantic annotations, we develop a multi-model collaborative pipeline for automated medical information completion and quality control, enabling the generation of diagnosis-related semantics beyond mask-only annotations. Building upon this dataset, we further construct MM-NeuroOnco-Bench, a manually annotated evaluation benchmark with a rejection-aware setting to reduce biases inherent in closed-ended question formats. Evaluation across ten representative models shows that even the strongest baseline, Gemini 3 Flash, achieves only 41.88% accuracy on diagnosis-related questions, highlighting the substantial challenges of multimodal brain tumor diagnostic understanding. Leveraging MM-NeuroOnco, we further propose NeuroOnco-GPT, which achieves a 27% absolute accuracy improvement on diagnostic questions following fine-tuning. This result demonstrates the effectiveness of our dataset and benchmark in advancing clinically grounded multimodal diagnostic reasoning. Code and dataset are publicly available at: https://github.com/gfnnnb/MM-NeuroOnco
Related papers
- Sim4Seg: Boosting Multimodal Multi-disease Medical Diagnosis Segmentation with Region-Aware Vision-Language Similarity Masks [54.00822479127598]
We introduce a medical vision-language task named Medical Diagnosis (MDS)<n>MDS aims to understand clinical queries for medical images and generate the corresponding segmentation masks as well as diagnostic results.<n>We propose Sim4Seg, a novel framework that improves the performance of diagnosis segmentation.
arXiv Detail & Related papers (2025-11-10T03:22:42Z) - Towards Label-Free Brain Tumor Segmentation: Unsupervised Learning with Multimodal MRI [7.144319861722029]
Unsupervised anomaly detection (UAD) presents a complementary alternative to supervised learning for brain tumor segmentation in MRI.<n>We propose a novel Multimodal Vision Transformer Autoencoder (MViT-AE) trained exclusively on healthy brain MRIs to detect and localize tumors.<n>Our method achieves clinically meaningful tumor localization, with lesion-wise Dice Similarity Coefficient of 0.437 (Whole Tumor), 0.316 (Tumor Core), and 0.350 (Enhancing Tumor) on the test set, and an anomaly Detection Rate of 89.4% on the validation set.
arXiv Detail & Related papers (2025-10-17T14:26:30Z) - DRBD-Mamba for Robust and Efficient Brain Tumor Segmentation with Analytical Insights [54.87947751720332]
Accurate brain tumor segmentation is significant for clinical diagnosis and treatment.<n>Mamba-based State Space Models have demonstrated promising performance.<n>We propose a dual-resolution bi-directional Mamba that captures multi-scale long-range dependencies with minimal computational overhead.
arXiv Detail & Related papers (2025-10-16T07:31:21Z) - RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis [56.373297358647655]
Retrieval-Augmented Diagnosis (RAD) is a novel framework that injects external knowledge into multimodal models directly on downstream tasks.<n>RAD operates through three key mechanisms: retrieval and refinement of disease-centered knowledge from multiple medical sources, a guideline-enhanced contrastive loss transformer, and a dual decoder.
arXiv Detail & Related papers (2025-09-24T10:36:14Z) - NeuroMoE: A Transformer-Based Mixture-of-Experts Framework for Multi-Modal Neurological Disorder Classification [3.5313393560458826]
Deep Learning has recently emerged as a powerful tool for extracting meaningful patterns from medical data to aid in diagnosis.<n>We propose a novel transformer-based Mixture-of-Experts (MoE) framework for classifying neurological disorders.<n>Our framework achieves a validation accuracy of 82.47%, outperforming baseline methods by over 10%.
arXiv Detail & Related papers (2025-06-17T20:40:06Z) - REMEMBER: Retrieval-based Explainable Multimodal Evidence-guided Modeling for Brain Evaluation and Reasoning in Zero- and Few-shot Neurodegenerative Diagnosis [6.446611581074913]
We introduce REMEMBER -- Retrieval-based Explainable Multimodalively-guided Modeling for Brain Evaluation and Reasoning.<n>REMEMBER is a new machine learning framework that facilitates zero- and few-shot Alzheimer's diagnosis using brain MRI scans.<n> Experimental results demonstrate that REMEMBER achieves robust zero- and few-shot performance.
arXiv Detail & Related papers (2025-04-12T22:06:15Z) - MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot [47.77948063906033]
Retrieval-augmented generation (RAG) is a well-suited technique for retrieving privacy-sensitive Electronic Health Records.<n>This paper proposes MedRAG, a RAG model enhanced by knowledge graph (KG)-elicited reasoning for the medical domain.<n>Tests show MedRAG provides more specific diagnostic insights and outperforms state-of-the-art models in reducing misdiagnosis rates.
arXiv Detail & Related papers (2025-02-06T12:27:35Z) - MADE-for-ASD: A Multi-Atlas Deep Ensemble Network for Diagnosing Autism Spectrum Disorder [4.7377709803078325]
This paper bridges the gap between traditional, time-consuming diagnostic methods and potential automated solutions.
We propose a multi-atlas deep ensemble network, MADE-for-ASD, that integrates multiple atlases of the brain's functional magnetic resonance imaging (fMRI) data.
Our approach integrates demographic information into the prediction workflow, which enhances ASD diagnosis performance.
arXiv Detail & Related papers (2024-07-09T17:49:23Z) - Cross-modality Guidance-aided Multi-modal Learning with Dual Attention
for MRI Brain Tumor Grading [47.50733518140625]
Brain tumor represents one of the most fatal cancers around the world, and is very common in children and the elderly.
We propose a novel cross-modality guidance-aided multi-modal learning with dual attention for addressing the task of MRI brain tumor grading.
arXiv Detail & Related papers (2024-01-17T07:54:49Z) - A Bi-Pyramid Multimodal Fusion Method for the Diagnosis of Bipolar
Disorders [11.622160966334745]
We utilize both MRIs and fMRI data and propose a multimodal diagnosis model for bipolar disorder.
Our proposed method outperforms others in balanced accuracy from 0.657 to 0.732 on the OpenfMRI dataset.
arXiv Detail & Related papers (2024-01-15T10:11:19Z) - StRegA: Unsupervised Anomaly Detection in Brain MRIs using a Compact
Context-encoding Variational Autoencoder [48.2010192865749]
Unsupervised anomaly detection (UAD) can learn a data distribution from an unlabelled dataset of healthy subjects and then be applied to detect out of distribution samples.
This research proposes a compact version of the "context-encoding" VAE (ceVAE) model, combined with pre and post-processing steps, creating a UAD pipeline (StRegA)
The proposed pipeline achieved a Dice score of 0.642$pm$0.101 while detecting tumours in T2w images of the BraTS dataset and 0.859$pm$0.112 while detecting artificially induced anomalies.
arXiv Detail & Related papers (2022-01-31T14:27:35Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.