Scaling Vision Transformers for Functional MRI with Flat Maps
- URL: http://arxiv.org/abs/2510.13768v1
- Date: Wed, 15 Oct 2025 17:15:00 GMT
- Title: Scaling Vision Transformers for Functional MRI with Flat Maps
- Authors: Connor Lane, Daniel Z. Kaplan, Tanishq Mathew Abraham, Paul S. Scotti,
- Abstract summary: We transform 4D fMRI data into videos of 2D fMRI activity flat maps.<n>We train Vision Transformers on 2.3K hours of fMRI flat map videos.<n>This work is part of an ongoing open science project to build foundation models for fMRI data.
- Score: 5.8791412590811305
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key question for adapting modern deep learning architectures to functional MRI (fMRI) is how to represent the data for model input. To bridge the modality gap between fMRI and natural images, we transform the 4D volumetric fMRI data into videos of 2D fMRI activity flat maps. We train Vision Transformers on 2.3K hours of fMRI flat map videos from the Human Connectome Project using the spatiotemporal masked autoencoder (MAE) framework. We observe that masked fMRI modeling performance improves with dataset size according to a strict power scaling law. Downstream classification benchmarks show that our model learns rich representations supporting both fine-grained state decoding across subjects, as well as subject-specific trait decoding across changes in brain state. This work is part of an ongoing open science project to build foundation models for fMRI data. Our code and datasets are available at https://github.com/MedARC-AI/fmri-fm.
Related papers
- ContextMRI: Enhancing Compressed Sensing MRI through Metadata Conditioning [51.26601171361753]
We propose ContextMRI, a text-conditioned diffusion model for MRI that integrates granular metadata into the reconstruction process.<n>We show that increasing the fidelity of metadata, ranging from slice location and contrast to patient age, sex, and pathology, systematically boosts reconstruction performance.
arXiv Detail & Related papers (2025-01-08T05:15:43Z) - Looking through the mind's eye via multimodal encoder-decoder networks [7.949204393111349]
We explore the decoding of mental imagery from subjects using their fMRI measurements.
We create a mapping between a subject's fMRI signals elicited by the videos the subjects watched and visual imagery.
We enhance an existing fMRI dataset, initially consisting of data from five subjects, by including recordings from three more subjects gathered by our team.
arXiv Detail & Related papers (2024-09-27T20:48:03Z) - MinD-3D++: Advancing fMRI-Based 3D Reconstruction with High-Quality Textured Mesh Generation and a Comprehensive Dataset [50.534007259536715]
Reconstructing 3D visuals from functional Magnetic Resonance Imaging (fMRI) data is of significant interest to cognitive neuroscience and computer vision.<n>We present the fMRI-3D dataset, which includes data from 15 participants and showcases a total of 4,768 3D objects.<n>We propose MinD-3D++, a novel framework for decoding textured 3D visual information from fMRI signals.
arXiv Detail & Related papers (2024-09-17T16:13:59Z) - MindFormer: Semantic Alignment of Multi-Subject fMRI for Brain Decoding [50.55024115943266]
We introduce a novel semantic alignment method of multi-subject fMRI signals using so-called MindFormer.
This model is specifically designed to generate fMRI-conditioned feature vectors that can be used for conditioning Stable Diffusion model for fMRI- to-image generation or large language model (LLM) for fMRI-to-text generation.
Our experimental results demonstrate that MindFormer generates semantically consistent images and text across different subjects.
arXiv Detail & Related papers (2024-05-28T00:36:25Z) - Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity [13.04953215936574]
We propose a two-stage model named Mind-Animator to reconstruct human dynamic vision from brain activity.<n>During the fMRI-to-feature stage, we decouple semantic, structure, and motion features from fMRI.<n>In the feature-to-video stage, these features are integrated into videos using an inflated Stable Diffusion.
arXiv Detail & Related papers (2024-05-06T08:56:41Z) - Synthetic Brain Images: Bridging the Gap in Brain Mapping With Generative Adversarial Model [0.0]
This work investigates the use of Deep Convolutional Generative Adversarial Networks (DCGAN) for producing high-fidelity and realistic MRI image slices.
While the discriminator network discerns between created and real slices, the generator network learns to synthesise realistic MRI image slices.
The generator refines its capacity to generate slices that closely mimic real MRI data through an adversarial training approach.
arXiv Detail & Related papers (2024-04-11T05:06:51Z) - NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation [55.51412454263856]
This paper proposes to directly modulate the generation process of diffusion models using fMRI signals.
By training with about 67,000 fMRI-image pairs from various individuals, our model enjoys superior fMRI-to-image decoding capacity.
arXiv Detail & Related papers (2024-03-27T02:42:52Z) - MinD-3D: Reconstruct High-quality 3D objects in Human Brain [50.534007259536715]
Recon3DMind is an innovative task aimed at reconstructing 3D visuals from Functional Magnetic Resonance Imaging (fMRI) signals.
We present the fMRI-Shape dataset, which includes data from 14 participants and features 360-degree videos of 3D objects.
We propose MinD-3D, a novel and effective three-stage framework specifically designed to decode the brain's 3D visual information from fMRI signals.
arXiv Detail & Related papers (2023-12-12T18:21:36Z) - fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for
Multi-Subject Brain Activity Decoding [54.17776744076334]
We propose fMRI-PTE, an innovative auto-encoder approach for fMRI pre-training.
Our approach involves transforming fMRI signals into unified 2D representations, ensuring consistency in dimensions and preserving brain activity patterns.
Our contributions encompass introducing fMRI-PTE, innovative data transformation, efficient training, a novel learning strategy, and the universal applicability of our approach.
arXiv Detail & Related papers (2023-11-01T07:24:22Z) - Video4MRI: An Empirical Study on Brain Magnetic Resonance Image
Analytics with CNN-based Video Classification Frameworks [60.42012344842292]
3D CNN-based models dominate the field of magnetic resonance image (MRI) analytics.
In this paper, four datasets of Alzheimer's and Parkinson's disease recognition are utilized in experiments.
In terms of efficiency, the video framework performs better than 3D-CNN models by 5% - 11% with 50% - 66% less trainable parameters.
arXiv Detail & Related papers (2023-02-24T15:26:31Z) - CoRRECT: A Deep Unfolding Framework for Motion-Corrected Quantitative R2* Mapping [9.783361575598025]
CoRRECT is a unified deep unfolding (DU) framework for Quantitative MRI (qMRI)<n>It consists of a model-based end-to-end neural network, a method for motion-artifact reduction, and a self-supervised learning scheme.<n>Our results on experimentally collected multi-Gradient-Recalled Echo (mGRE) MRI data show that CoRRECT recovers motion and inhomogeneity artifact-free R2* maps in highly accelerated acquisition settings.
arXiv Detail & Related papers (2022-10-12T15:49:51Z) - Attentive Symmetric Autoencoder for Brain MRI Segmentation [56.02577247523737]
We propose a novel Attentive Symmetric Auto-encoder based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks.
In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics.
Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models.
arXiv Detail & Related papers (2022-09-19T09:43:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.