The Pictorial Cortex: Zero-Shot Cross-Subject fMRI-to-Image Reconstruction via Compositional Latent Modeling
- URL: http://arxiv.org/abs/2601.15071v1
- Date: Wed, 21 Jan 2026 15:15:27 GMT
- Title: The Pictorial Cortex: Zero-Shot Cross-Subject fMRI-to-Image Reconstruction via Compositional Latent Modeling
- Authors: Jingyang Huo, Yikai Wang, Yanwei Fu, Jianfeng Feng,
- Abstract summary: Decoding visual experiences from human brain activity remains a central challenge at the intersection of neuroscience, neuroimaging, and artificial intelligence.<n>We propose PictorialCortex, which models fMRI activity using a compositional latent formulation that structures stimulus-driven representations under subject-, dataset-, and trial-related variability.
- Score: 43.65221373201575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decoding visual experiences from human brain activity remains a central challenge at the intersection of neuroscience, neuroimaging, and artificial intelligence. A critical obstacle is the inherent variability of cortical responses: neural activity elicited by the same visual stimulus differs across individuals and trials due to anatomical, functional, cognitive, and experimental factors, making fMRI-to-image reconstruction non-injective. In this paper, we tackle a challenging yet practically meaningful problem: zero-shot cross-subject fMRI-to-image reconstruction, where the visual experience of a previously unseen individual must be reconstructed without subject-specific training. To enable principled evaluation, we present a unified cortical-surface dataset -- UniCortex-fMRI, assembled from multiple visual-stimulus fMRI datasets to provide broad coverage of subjects and stimuli. Our UniCortex-fMRI is particularly processed by standardized data formats to make it possible to explore this possibility in the zero-shot scenario of cross-subject fMRI-to-image reconstruction. To tackle the modeling challenge, we propose PictorialCortex, which models fMRI activity using a compositional latent formulation that structures stimulus-driven representations under subject-, dataset-, and trial-related variability. PictorialCortex operates in a universal cortical latent space and implements this formulation through a latent factorization--composition module, reinforced by paired factorization and re-factorizing consistency regularization. During inference, surrogate latents synthesized under multiple seen-subject conditions are aggregated to guide diffusion-based image synthesis for unseen subjects. Extensive experiments show that PictorialCortex improves zero-shot cross-subject visual reconstruction, highlighting the benefits of compositional latent modeling and multi-dataset training.
Related papers
- Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction [65.67001243986981]
We propose MindHier, a coarse-to-fine fMRI-to-image reconstruction framework built on scale-wise autoregressive modeling.<n>MindHier achieves superior semantic fidelity, 4.67x faster inference, and more deterministic results than the diffusion-based baselines.
arXiv Detail & Related papers (2025-10-25T15:40:07Z) - MindBridge: A Cross-Subject Brain Decoding Framework [60.58552697067837]
Brain decoding aims to reconstruct stimuli from acquired brain signals.
Currently, brain decoding is confined to a per-subject-per-model paradigm.
We present MindBridge, that achieves cross-subject brain decoding by employing only one model.
arXiv Detail & Related papers (2024-04-11T15:46:42Z) - Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity [60.983327742457995]
Reconstructing the viewed images from human brain activity bridges human and computer vision through the Brain-Computer Interface.
We devise Psychometry, an omnifit model for reconstructing images from functional Magnetic Resonance Imaging (fMRI) obtained from different subjects.
arXiv Detail & Related papers (2024-03-29T07:16:34Z) - NeuralDiffuser: Neuroscience-inspired Diffusion Guidance for fMRI Visual Reconstruction [25.987801733791986]
We propose NeuralDiffuser, which incorporates primary visual feature guidance to provide detailed cues in the form of gradients.<n>This extension of the bottom-up process for diffusion models achieves both semantic coherence and detail fidelity when reconstructing visual stimuli.
arXiv Detail & Related papers (2024-02-21T13:46:25Z) - Double-Flow GAN model for the reconstruction of perceived faces from brain activities [13.707575848841405]
We proposed a novel reconstruction framework, which we called Double-Flow GAN.<n>We also designed a pretraining process that uses features extracted from images as conditions for making it possible to pretrain the conditional reconstruction model from fMRI.<n>Results showed that the proposed method is significant at accurately reconstructing multiple face attributes, outperforms the previous reconstruction models, and exhibited state-of-the-art reconstruction abilities.
arXiv Detail & Related papers (2023-12-12T18:07:57Z) - fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for
Multi-Subject Brain Activity Decoding [54.17776744076334]
We propose fMRI-PTE, an innovative auto-encoder approach for fMRI pre-training.
Our approach involves transforming fMRI signals into unified 2D representations, ensuring consistency in dimensions and preserving brain activity patterns.
Our contributions encompass introducing fMRI-PTE, innovative data transformation, efficient training, a novel learning strategy, and the universal applicability of our approach.
arXiv Detail & Related papers (2023-11-01T07:24:22Z) - UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion
Model from Human Brain Activity [2.666777614876322]
We propose UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion Model from Human Brain Activity.
We transform fMRI voxels into text and image latent for low-level information to generate realistic captions and images.
UniBrain outperforms current methods both qualitatively and quantitatively in terms of image reconstruction and reports image captioning results for the first time on the Natural Scenes dataset.
arXiv Detail & Related papers (2023-08-14T19:49:29Z) - MindDiffuser: Controlled Image Reconstruction from Human Brain Activity
with Semantic and Structural Diffusion [7.597218661195779]
We propose a two-stage image reconstruction model called MindDiffuser.
In Stage 1, the VQ-VAE latent representations and the CLIP text embeddings decoded from fMRI are put into Stable Diffusion.
In Stage 2, we utilize the CLIP visual feature decoded from fMRI as supervisory information, and continually adjust the two feature vectors decoded in Stage 1 through backpagation to align the structural information.
arXiv Detail & Related papers (2023-08-08T13:28:34Z) - MindDiffuser: Controlled Image Reconstruction from Human Brain Activity
with Semantic and Structural Diffusion [8.299415606889024]
We propose a two-stage image reconstruction model called MindDiffuser.
In Stage 1, the VQ-VAE latent representations and the CLIP text embeddings decoded from fMRI are put into the image-to-image process of Stable Diffusion.
In Stage 2, we utilize the low-level CLIP visual features decoded from fMRI as supervisory information.
arXiv Detail & Related papers (2023-03-24T16:41:42Z) - BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP
for Generic Natural Visual Stimulus Decoding [51.911473457195555]
BrainCLIP is a task-agnostic fMRI-based brain decoding model.
It bridges the modality gap between brain activity, image, and text.
BrainCLIP can reconstruct visual stimuli with high semantic fidelity.
arXiv Detail & Related papers (2023-02-25T03:28:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.