NeuroSwift: A Lightweight Cross-Subject Framework for fMRI Visual Reconstruction of Complex Scenes
- URL: http://arxiv.org/abs/2510.02266v2
- Date: Sun, 12 Oct 2025 15:24:08 GMT
- Title: NeuroSwift: A Lightweight Cross-Subject Framework for fMRI Visual Reconstruction of Complex Scenes
- Authors: Shiyi Zhang, Dong Liang, Yihang Zhou,
- Abstract summary: Cross-subject reconstruction of visual stimuli remains challenging and computationally demanding.<n>We propose NeuroSwift, which integrates adapters via diffusion: AutoKL for low-level features and CLIP for semantics.<n>For cross-subject generalization, we pretrain on one subject and then fine-tune only 17 percent of parameters for new subjects, while freezing other components.
- Score: 8.32275773383994
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Reconstructing visual information from brain activity via computer vision technology provides an intuitive understanding of visual neural mechanisms. Despite progress in decoding fMRI data with generative models, achieving accurate cross-subject reconstruction of visual stimuli remains challenging and computationally demanding. This difficulty arises from inter-subject variability in neural representations and the brain's abstract encoding of core semantic features in complex visual inputs. To address these challenges, we propose NeuroSwift, which integrates complementary adapters via diffusion: AutoKL for low-level features and CLIP for semantics. NeuroSwift's CLIP Adapter is trained on Stable Diffusion generated images paired with COCO captions to emulate higher visual cortex encoding. For cross-subject generalization, we pretrain on one subject and then fine-tune only 17 percent of parameters (fully connected layers) for new subjects, while freezing other components. This enables state-of-the-art performance with only one hour of training per subject on lightweight GPUs (three RTX 4090), and it outperforms existing methods.
Related papers
- Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction [65.67001243986981]
We propose MindHier, a coarse-to-fine fMRI-to-image reconstruction framework built on scale-wise autoregressive modeling.<n>MindHier achieves superior semantic fidelity, 4.67x faster inference, and more deterministic results than the diffusion-based baselines.
arXiv Detail & Related papers (2025-10-25T15:40:07Z) - VoxelFormer: Parameter-Efficient Multi-Subject Visual Decoding from fMRI [4.3296865400748]
VoxelFormer is a lightweight transformer architecture that enables multi-subject training for visual decoding from fMRI.<n>It integrates a Token Merging Transformer (ToMer) for efficient voxel compression and a query-driven Q-Former that produces fixed-size neural representations aligned with the CLIP image embedding space.
arXiv Detail & Related papers (2025-09-10T21:20:17Z) - SynBrain: Enhancing Visual-to-fMRI Synthesis via Probabilistic Representation Learning [50.69448058071441]
Deciphering how visual stimuli are transformed into cortical responses is a fundamental challenge in computational neuroscience.<n>We propose SynBrain, a generative framework that simulates the transformation from visual semantics to neural responses.<n>We show that SynBrain surpasses state-of-the-art methods in subject-specific visual-to-fMRI encoding performance.
arXiv Detail & Related papers (2025-08-14T03:01:05Z) - Meta-Learning an In-Context Transformer Model of Human Higher Visual Cortex [5.283925904540581]
BraInCoRL uses in-context learning to predict voxelwise neural responses from few-shot examples.<n>We show that BraInCoRL consistently outperforms existing voxelwise encoder designs in a low-data regime.<n>BraInCoRL facilitates better interpretability of neural signals in higher visual cortex by attending to semantically relevant stimuli.
arXiv Detail & Related papers (2025-05-21T17:59:41Z) - Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction [13.110669865114533]
NEURONS is a concept framework that decouples learning into four correlated sub-tasks.<n>It simulates the visual cortex's functional specialization, allowing the model to capture diverse video content.<n>NEURONS shows a strong functional correlation with the visual cortex, highlighting its potential for brain-computer interfaces and clinical applications.
arXiv Detail & Related papers (2025-03-14T08:12:28Z) - MindFormer: Semantic Alignment of Multi-Subject fMRI for Brain Decoding [50.55024115943266]
We introduce a novel semantic alignment method of multi-subject fMRI signals using so-called MindFormer.
This model is specifically designed to generate fMRI-conditioned feature vectors that can be used for conditioning Stable Diffusion model for fMRI- to-image generation or large language model (LLM) for fMRI-to-text generation.
Our experimental results demonstrate that MindFormer generates semantically consistent images and text across different subjects.
arXiv Detail & Related papers (2024-05-28T00:36:25Z) - See Through Their Minds: Learning Transferable Neural Representation from Cross-Subject fMRI [32.40827290083577]
Deciphering visual content from functional Magnetic Resonance Imaging (fMRI) helps illuminate the human vision system.
Previous approaches primarily employ subject-specific models, sensitive to training sample size.
We propose shallow subject-specific adapters to map cross-subject fMRI data into unified representations.
During training, we leverage both visual and textual supervision for multi-modal brain decoding.
arXiv Detail & Related papers (2024-03-11T01:18:49Z) - Learning Multimodal Volumetric Features for Large-Scale Neuron Tracing [72.45257414889478]
We aim to reduce human workload by predicting connectivity between over-segmented neuron pieces.
We first construct a dataset, named FlyTracing, that contains millions of pairwise connections of segments expanding the whole fly brain.
We propose a novel connectivity-aware contrastive learning method to generate dense volumetric EM image embedding.
arXiv Detail & Related papers (2024-01-05T19:45:12Z) - Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images.
We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations.
The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z) - BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP
for Generic Natural Visual Stimulus Decoding [51.911473457195555]
BrainCLIP is a task-agnostic fMRI-based brain decoding model.
It bridges the modality gap between brain activity, image, and text.
BrainCLIP can reconstruct visual stimuli with high semantic fidelity.
arXiv Detail & Related papers (2023-02-25T03:28:54Z) - Convolutional Neural Generative Coding: Scaling Predictive Coding to
Natural Images [79.07468367923619]
We develop convolutional neural generative coding (Conv-NGC)
We implement a flexible neurobiologically-motivated algorithm that progressively refines latent state maps.
We study the effectiveness of our brain-inspired neural system on the tasks of reconstruction and image denoising.
arXiv Detail & Related papers (2022-11-22T06:42:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.