SynMind: Reducing Semantic Hallucination in fMRI-Based Image Reconstruction
- URL: http://arxiv.org/abs/2601.17857v1
- Date: Sun, 25 Jan 2026 14:31:23 GMT
- Title: SynMind: Reducing Semantic Hallucination in fMRI-Based Image Reconstruction
- Authors: Lan Yang, Minghan Yang, Ke Li, Honggang Zhang, Kaiyue Pang, Yi-Zhe Song,
- Abstract summary: We argue that existing methods rely too heavily on entangled visual embeddings over explicit semantic identity.<n>We parse fMRI signals into rich, sentence-level semantic descriptions that mirror the hierarchical and compositional nature of human visual understanding.<n>We propose SynMind, a framework that integrates these explicit semantic encodings with visual priors to condition a pretrained diffusion model.
- Score: 52.34513874272676
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent advances in fMRI-based image reconstruction have achieved remarkable photo-realistic fidelity. Yet, a persistent limitation remains: while reconstructed images often appear naturalistic and holistically similar to the target stimuli, they frequently suffer from severe semantic misalignment -- salient objects are often replaced or hallucinated despite high visual quality. In this work, we address this limitation by rethinking the role of explicit semantic interpretation in fMRI decoding. We argue that existing methods rely too heavily on entangled visual embeddings which prioritize low-level appearance cues -- such as texture and global gist -- over explicit semantic identity. To overcome this, we parse fMRI signals into rich, sentence-level semantic descriptions that mirror the hierarchical and compositional nature of human visual understanding. We achieve this by leveraging grounded VLMs to generate synthetic, human-like, multi-granularity textual representations that capture object identities and spatial organization. Built upon this foundation, we propose SynMind, a framework that integrates these explicit semantic encodings with visual priors to condition a pretrained diffusion model. Extensive experiments demonstrate that SynMind outperforms state-of-the-art methods across most quantitative metrics. Notably, by offloading semantic reasoning to our text-alignment module, SynMind surpasses competing methods based on SDXL while using the much smaller Stable Diffusion 1.4 and a single consumer GPU. Large-scale human evaluations further confirm that SynMind produces reconstructions more consistent with human visual perception. Neurovisualization analyses reveal that SynMind engages broader and more semantically relevant brain regions, mitigating the over-reliance on high-level visual areas.
Related papers
- Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction [65.67001243986981]
We propose MindHier, a coarse-to-fine fMRI-to-image reconstruction framework built on scale-wise autoregressive modeling.<n>MindHier achieves superior semantic fidelity, 4.67x faster inference, and more deterministic results than the diffusion-based baselines.
arXiv Detail & Related papers (2025-10-25T15:40:07Z) - Seeing Through the Brain: New Insights from Decoding Visual Stimuli with fMRI [39.952672554797125]
We show that fMRI signals are more similar to the text space of a language model than to either a vision based space or a joint text image space.<n>We propose PRISM, a model that Projects fMRI sIgnals into a Structured text space as an interMediate representation for visual stimuli reconstruction.
arXiv Detail & Related papers (2025-10-17T20:18:06Z) - SynBrain: Enhancing Visual-to-fMRI Synthesis via Probabilistic Representation Learning [54.390403684665834]
Deciphering how visual stimuli are transformed into cortical responses is a fundamental challenge in computational neuroscience.<n>We propose SynBrain, a generative framework that simulates the transformation from visual semantics to neural responses in a probabilistic and biologically interpretable manner.<n> Experimental results demonstrate that SynBrain surpasses state-of-the-art methods in subject-specific visual-to-fMRI encoding performance.
arXiv Detail & Related papers (2025-08-14T03:01:05Z) - Perception Activator: An intuitive and portable framework for brain cognitive exploration [19.851643249367108]
We develop an experimental framework that uses fMRI representations as intervention conditions.<n>We compare both downstream performance and intermediate feature changes on object detection and instance segmentation tasks with and without fMRI information.<n>Our results prove that fMRI contains rich multi-object semantic cues and coarse spatial localization information-elements that current models have yet to fully exploit or integrate.
arXiv Detail & Related papers (2025-07-03T04:46:48Z) - Controllable Mind Visual Diffusion Model [58.83896307930354]
Brain signal visualization has emerged as an active research area, serving as a critical interface between the human visual system and computer vision models.
We propose a novel approach, referred to as Controllable Mind Visual Model Diffusion (CMVDM)
CMVDM extracts semantic and silhouette information from fMRI data using attribute alignment and assistant networks.
We then leverage a control model to fully exploit the extracted information for image synthesis, resulting in generated images that closely resemble the visual stimuli in terms of semantics and silhouette.
arXiv Detail & Related papers (2023-05-17T11:36:40Z) - Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding.
Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z) - GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from
Multi-view Images [79.39247661907397]
We introduce an effective framework Generalizable Model-based Neural Radiance Fields to synthesize free-viewpoint images.
Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy.
arXiv Detail & Related papers (2023-03-24T03:32:02Z) - Semantic Brain Decoding: from fMRI to conceptually similar image
reconstruction of visual stimuli [0.29005223064604074]
We propose a novel approach to brain decoding that also relies on semantic and contextual similarity.
We employ an fMRI dataset of natural image vision and create a deep learning decoding pipeline inspired by the existence of both bottom-up and top-down processes in human vision.
We produce reconstructions of visual stimuli that match the original content very well on a semantic level, surpassing the state of the art in previous literature.
arXiv Detail & Related papers (2022-12-13T16:54:08Z) - Mind Reader: Reconstructing complex images from brain activities [16.78619734818198]
We focus on reconstructing the complex image stimuli from fMRI (functional magnetic resonance imaging) signals.
Unlike previous works that reconstruct images with single objects or simple shapes, our work aims to reconstruct image stimuli rich in semantics.
We find that incorporating an additional text modality is beneficial for the reconstruction problem compared to directly translating brain signals to images.
arXiv Detail & Related papers (2022-09-30T06:32:46Z) - You Only Need Adversarial Supervision for Semantic Image Synthesis [84.83711654797342]
We propose a novel, simplified GAN model, which needs only adversarial supervision to achieve high quality results.
We show that images synthesized by our model are more diverse and follow the color and texture of real images more closely.
arXiv Detail & Related papers (2020-12-08T23:00:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.