Related papers: RTGMFF: Enhanced fMRI-based Brain Disorder Diagnosis via ROI-driven Text Generation and Multimodal Feature Fusion

RTGMFF: Enhanced fMRI-based Brain Disorder Diagnosis via ROI-driven Text Generation and Multimodal Feature Fusion

URL: http://arxiv.org/abs/2509.03214v1
Date: Wed, 03 Sep 2025 11:05:57 GMT
Title: RTGMFF: Enhanced fMRI-based Brain Disorder Diagnosis via ROI-driven Text Generation and Multimodal Feature Fusion
Authors: Junhao Jia, Yifei Sun, Yunyou Liu, Cheng Yang, Changmiao Wang, Feiwei Qin, Yong Peng, Wenwen Min,
Abstract summary: We introduce RTGMFF, a framework that unifies automatic ROI-level text generation with multimodal feature fusion for brain-disorder diagnosis.<n>Tests show that RTGMFF surpasses current methods in diagnostic accuracy, achieving notable gains in sensitivity, specificity, and area under the ROC curve.
Score: 13.589735978929085
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Functional magnetic resonance imaging (fMRI) is a powerful tool for probing brain function, yet reliable clinical diagnosis is hampered by low signal-to-noise ratios, inter-subject variability, and the limited frequency awareness of prevailing CNN- and Transformer-based models. Moreover, most fMRI datasets lack textual annotations that could contextualize regional activation and connectivity patterns. We introduce RTGMFF, a framework that unifies automatic ROI-level text generation with multimodal feature fusion for brain-disorder diagnosis. RTGMFF consists of three components: (i) ROI-driven fMRI text generation deterministically condenses each subject's activation, connectivity, age, and sex into reproducible text tokens; (ii) Hybrid frequency-spatial encoder fuses a hierarchical wavelet-mamba branch with a cross-scale Transformer encoder to capture frequency-domain structure alongside long-range spatial dependencies; and (iii) Adaptive semantic alignment module embeds the ROI token sequence and visual features in a shared space, using a regularized cosine-similarity loss to narrow the modality gap. Extensive experiments on the ADHD-200 and ABIDE benchmarks show that RTGMFF surpasses current methods in diagnostic accuracy, achieving notable gains in sensitivity, specificity, and area under the ROC curve. Code is available at https://github.com/BeistMedAI/RTGMFF.

Related papers

AI-Enhanced Virtual Biopsies for Brain Tumor Diagnosis in Low Resource Settings [0.0]
This paper presents a prototype virtual biopsy pipeline for four-class classification of 2D brain MRI images using a lightweight convolutional neural network (CNN) and radiomics-style handcrafted features.<n>The system is framed as decision support and not a substitute for clinical diagnosis or histopathology.
arXiv Detail & Related papers (2025-12-19T19:53:56Z)
Ada-FCN: Adaptive Frequency-Coupled Network for fMRI-Based Brain Disorder Classification [9.889378402402754]
We propose a framework featuring Adaptive Cas cade Decomposition to learn task-relevant frequency sub-bands for each brain region.<n>We also propose Frequency-Coupled Connectivity Learning to capture both intra- and nuanced cross-band interactions in a unified functional network.
arXiv Detail & Related papers (2025-11-06T08:57:07Z)
Adapting HFMCA to Graph Data: Self-Supervised Learning for Generalizable fMRI Representations [57.054499278843856]
Functional magnetic resonance imaging (fMRI) analysis faces significant challenges due to limited dataset sizes and domain variability between studies.<n>Traditional self-supervised learning methods inspired by computer vision often rely on positive and negative sample pairs.<n>We propose adapting a recently developed Hierarchical Functional Maximal Correlation Algorithm (HFMCA) to graph-structured fMRI data.
arXiv Detail & Related papers (2025-10-05T12:35:01Z)
Interactive Gadolinium-Free MRI Synthesis: A Transformer with Localization Prompt Learning [6.716077690014641]
We propose a novel Transformer with Localization Prompts framework for synthesizing CE-MRI from non-contrast MR images.<n>Our architecture introduces three key innovations: a hierarchical backbone that uses efficient Transformer to process multi-scale features; a multi-stage fusion system that hierarchically integrate complementary information via spatial attention operations and cross-attention mechanisms, respectively.<n>The framework enables interactive clinical integration by allowing radiologists to input diagnostic prompts during inference, synergizing artificial intelligence with medical expertise.
arXiv Detail & Related papers (2025-03-03T07:44:28Z)
MHSA: A Multi-scale Hypergraph Network for Mild Cognitive Impairment Detection via Synchronous and Attentive Fusion [4.526574526136158]
A Multi-scale Hypergraph Network for MCI Detection via Synchronous and Attentive Fusion is presented.<n>Our approach employs the Phase-Locking Value (PLV) to calculate the phase synchronization relationship in the spectrum domain of regions of interest.<n>We structure the PLV coefficients dynamically adjust strategy, and the dynamic hypergraph is modelled based on a comprehensive temporal-spectrum fusion matrix.
arXiv Detail & Related papers (2024-12-11T02:59:57Z)
Accelerated Multi-Contrast MRI Reconstruction via Frequency and Spatial Mutual Learning [50.74383395813782]
We propose a novel Frequency and Spatial Mutual Learning Network (FSMNet) to explore global dependencies across different modalities. The proposed FSMNet achieves state-of-the-art performance for the Multi-Contrast MR Reconstruction task with different acceleration factors.
arXiv Detail & Related papers (2024-09-21T12:02:47Z)
MindFormer: Semantic Alignment of Multi-Subject fMRI for Brain Decoding [50.55024115943266]
We introduce a novel semantic alignment method of multi-subject fMRI signals using so-called MindFormer. This model is specifically designed to generate fMRI-conditioned feature vectors that can be used for conditioning Stable Diffusion model for fMRI- to-image generation or large language model (LLM) for fMRI-to-text generation. Our experimental results demonstrate that MindFormer generates semantically consistent images and text across different subjects.
arXiv Detail & Related papers (2024-05-28T00:36:25Z)
NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation [55.51412454263856]
This paper proposes to directly modulate the generation process of diffusion models using fMRI signals. By training with about 67,000 fMRI-image pairs from various individuals, our model enjoys superior fMRI-to-image decoding capacity.
arXiv Detail & Related papers (2024-03-27T02:42:52Z)
Unlocking Fine-Grained Details with Wavelet-based High-Frequency Enhancement in Transformers [4.208461204572879]
Medical image segmentation is a critical task that plays a vital role in diagnosis, treatment planning, and disease monitoring. We address the local feature deficiency of the Transformer model by carefully re-designing the self-attention map. We propose a multi-scale context enhancement block within skip connections to adaptively model inter-scale dependencies.
arXiv Detail & Related papers (2023-08-25T15:42:19Z)
Source-Free Collaborative Domain Adaptation via Multi-Perspective Feature Enrichment for Functional MRI Analysis [55.03872260158717]
Resting-state MRI functional (rs-fMRI) is increasingly employed in multi-site research to aid neurological disorder analysis. Many methods have been proposed to reduce fMRI heterogeneity between source and target domains. But acquiring source data is challenging due to concerns and/or data storage burdens in multi-site studies. We design a source-free collaborative domain adaptation framework for fMRI analysis, where only a pretrained source model and unlabeled target data are accessible.
arXiv Detail & Related papers (2023-08-24T01:30:18Z)
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation [48.723504098917324]
We propose an Unify, Align and then Refine (UAR) approach to learn multi-level cross-modal alignments. We introduce three novel modules: Latent Space Unifier, Cross-modal Representation Aligner and Text-to-Image Refiner. Experiments and analyses on IU-Xray and MIMIC-CXR benchmark datasets demonstrate the superiority of our UAR against varied state-of-the-art methods.
arXiv Detail & Related papers (2023-03-28T12:42:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.