Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction
- URL: http://arxiv.org/abs/2510.22335v1
- Date: Sat, 25 Oct 2025 15:40:07 GMT
- Title: Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction
- Authors: Xu Zhang, Ruijie Quan, Wenguan Wang, Yi Yang,
- Abstract summary: We propose MindHier, a coarse-to-fine fMRI-to-image reconstruction framework built on scale-wise autoregressive modeling.<n>MindHier achieves superior semantic fidelity, 4.67x faster inference, and more deterministic results than the diffusion-based baselines.
- Score: 65.67001243986981
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reconstructing visual stimuli from fMRI signals is a central challenge bridging machine learning and neuroscience. Recent diffusion-based methods typically map fMRI activity to a single high-level embedding, using it as fixed guidance throughout the entire generation process. However, this fixed guidance collapses hierarchical neural information and is misaligned with the stage-dependent demands of image reconstruction. In response, we propose MindHier, a coarse-to-fine fMRI-to-image reconstruction framework built on scale-wise autoregressive modeling. MindHier introduces three components: a Hierarchical fMRI Encoder to extract multi-level neural embeddings, a Hierarchy-to-Hierarchy Alignment scheme to enforce layer-wise correspondence with CLIP features, and a Scale-Aware Coarse-to-Fine Neural Guidance strategy to inject these embeddings into autoregression at matching scales. These designs make MindHier an efficient and cognitively-aligned alternative to diffusion-based methods by enabling a hierarchical reconstruction process that synthesizes global semantics before refining local details, akin to human visual perception. Extensive experiments on the NSD dataset show that MindHier achieves superior semantic fidelity, 4.67x faster inference, and more deterministic results than the diffusion-based baselines.
Related papers
- BrainRVQ: A High-Fidelity EEG Foundation Model via Dual-Domain Residual Quantization and Hierarchical Autoregression [26.114257185901838]
We propose BrainRVQ, a general-purpose EEG foundation model pre-trained on a large-scale corpus of clinical EEG data.<n>BrainRVQ features a Dual-Domain Residual Vector Quantization (DD-RVQ) tokenizer that disentangles temporal waveforms and spectral patterns into hierarchical discrete codes.
arXiv Detail & Related papers (2026-02-18T23:30:36Z) - SynMind: Reducing Semantic Hallucination in fMRI-Based Image Reconstruction [52.34513874272676]
We argue that existing methods rely too heavily on entangled visual embeddings over explicit semantic identity.<n>We parse fMRI signals into rich, sentence-level semantic descriptions that mirror the hierarchical and compositional nature of human visual understanding.<n>We propose SynMind, a framework that integrates these explicit semantic encodings with visual priors to condition a pretrained diffusion model.
arXiv Detail & Related papers (2026-01-25T14:31:23Z) - Hi-DREAM: Brain Inspired Hierarchical Diffusion for fMRI Reconstruction via ROI Encoder and visuAl Mapping [5.019958634393433]
Hi-DREAM is a conditional diffusion framework that makes the cortical organization explicit.<n>A lightweight, depth-matched ControlNet injects scale-specific hints during denoising.<n>Experiments show that Hi-DREAM attains state-of-the-art performance on high-level semantic metrics.
arXiv Detail & Related papers (2025-11-14T16:05:44Z) - Perception Activator: An intuitive and portable framework for brain cognitive exploration [19.851643249367108]
We develop an experimental framework that uses fMRI representations as intervention conditions.<n>We compare both downstream performance and intermediate feature changes on object detection and instance segmentation tasks with and without fMRI information.<n>Our results prove that fMRI contains rich multi-object semantic cues and coarse spatial localization information-elements that current models have yet to fully exploit or integrate.
arXiv Detail & Related papers (2025-07-03T04:46:48Z) - Optimized two-stage AI-based Neural Decoding for Enhanced Visual Stimulus Reconstruction from fMRI Data [2.0851013563386247]
This work proposes a non-linear deep network to improve fMRI latent space representation, optimizing the dimensionality alike.<n>Experiments on the Natural Scenes dataset showed that the proposed architecture improved the structural similarity of the reconstructed image by about 2% with respect to the state-of-the-art model.<n>The noise sensitivity analysis of the LDM showed that the role of the first stage was fundamental to predict the stimulus featuring high structural similarity.
arXiv Detail & Related papers (2024-12-17T16:42:55Z) - MindFormer: Semantic Alignment of Multi-Subject fMRI for Brain Decoding [50.55024115943266]
We introduce a novel semantic alignment method of multi-subject fMRI signals using so-called MindFormer.
This model is specifically designed to generate fMRI-conditioned feature vectors that can be used for conditioning Stable Diffusion model for fMRI- to-image generation or large language model (LLM) for fMRI-to-text generation.
Our experimental results demonstrate that MindFormer generates semantically consistent images and text across different subjects.
arXiv Detail & Related papers (2024-05-28T00:36:25Z) - NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation [55.51412454263856]
This paper proposes to directly modulate the generation process of diffusion models using fMRI signals.
By training with about 67,000 fMRI-image pairs from various individuals, our model enjoys superior fMRI-to-image decoding capacity.
arXiv Detail & Related papers (2024-03-27T02:42:52Z) - fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for
Multi-Subject Brain Activity Decoding [54.17776744076334]
We propose fMRI-PTE, an innovative auto-encoder approach for fMRI pre-training.
Our approach involves transforming fMRI signals into unified 2D representations, ensuring consistency in dimensions and preserving brain activity patterns.
Our contributions encompass introducing fMRI-PTE, innovative data transformation, efficient training, a novel learning strategy, and the universal applicability of our approach.
arXiv Detail & Related papers (2023-11-01T07:24:22Z) - Natural scene reconstruction from fMRI signals using generative latent
diffusion [1.90365714903665]
We present a two-stage scene reconstruction framework called Brain-Diffuser''
In the first stage, we reconstruct images that capture low-level properties and overall layout using a VDVAE (Very Deep Vari Autoencoder) model.
In the second stage, we use the image-to-image framework of a latent diffusion model conditioned on predicted multimodal (text and visual) features.
arXiv Detail & Related papers (2023-03-09T15:24:26Z) - BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP
for Generic Natural Visual Stimulus Decoding [51.911473457195555]
BrainCLIP is a task-agnostic fMRI-based brain decoding model.
It bridges the modality gap between brain activity, image, and text.
BrainCLIP can reconstruct visual stimuli with high semantic fidelity.
arXiv Detail & Related papers (2023-02-25T03:28:54Z) - Over-and-Under Complete Convolutional RNN for MRI Reconstruction [57.95363471940937]
Recent deep learning-based methods for MR image reconstruction usually leverage a generic auto-encoder architecture.
We propose an Over-and-Under Complete Convolu?tional Recurrent Neural Network (OUCR), which consists of an overcomplete and an undercomplete Convolutional Recurrent Neural Network(CRNN)
The proposed method achieves significant improvements over the compressed sensing and popular deep learning-based methods with less number of trainable parameters.
arXiv Detail & Related papers (2021-06-16T15:56:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.