Learning Brain Representation with Hierarchical Visual Embeddings
- URL: http://arxiv.org/abs/2602.07495v1
- Date: Sat, 07 Feb 2026 11:14:03 GMT
- Title: Learning Brain Representation with Hierarchical Visual Embeddings
- Authors: Jiawen Zheng, Haonan Jia, Ming Li, Yuhui Zheng, Yufeng Zeng, Yang Gao, Chen Liang,
- Abstract summary: We propose a brain-image alignment strategy that leverages pre-trained visual encoders with distinct inductive biases to capture hierarchical and multi-scale visual representations.<n>Our method achieves a favorable balance between retrieval accuracy and reconstruction fidelity.
- Score: 30.701493890961284
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decoding visual representations from brain signals has attracted significant attention in both neuroscience and artificial intelligence. However, the degree to which brain signals truly encode visual information remains unclear. Current visual decoding approaches explore various brain-image alignment strategies, yet most emphasize high-level semantic features while neglecting pixel-level details, thereby limiting our understanding of the human visual system. In this paper, we propose a brain-image alignment strategy that leverages multiple pre-trained visual encoders with distinct inductive biases to capture hierarchical and multi-scale visual representations, while employing a contrastive learning objective to achieve effective alignment between brain signals and visual embeddings. Furthermore, we introduce a Fusion Prior, which learns a stable mapping on large-scale visual data and subsequently matches brain features to this pre-trained prior, thereby enhancing distributional consistency across modalities. Extensive quantitative and qualitative experiments demonstrate that our method achieves a favorable balance between retrieval accuracy and reconstruction fidelity.
Related papers
- Toward Cognitive Supersensing in Multimodal Large Language Model [67.15559571626747]
We introduce Cognitive Supersensing, a training paradigm that endows MLLMs with human-like visual imagery capabilities.<n>In experiments, MLLMs trained with Cognitive Supersensing significantly outperform state-of-the-art baselines on CogSense-Bench.<n>We will open-source the CogSense-Bench and our model weights.
arXiv Detail & Related papers (2026-02-02T02:19:50Z) - Adaptive Decoding via Hierarchical Neural Information Gradients in Mouse Visual Tasks [7.199942082447265]
hierarchically deep neural networks (DNNs) have played a significant role as tools for mining the core features of complex data.<n>We propose a method for studying the adaptive topological decoding between brain regions, called the Adaptive Topological Vision Transformer, or AT-ViT.<n>In numerous experiments, the results reveal the importance of the proposed method in hierarchical networks in the visual tasks.
arXiv Detail & Related papers (2025-10-10T15:00:59Z) - Towards Interpretable Visual Decoding with Attention to Brain Representations [3.254716591226115]
Recent work has demonstrated that complex visual stimuli can be decoded from human brain activity using deep generative models.<n>We propose NeuroAdapter, a visual decoding framework that directly conditions a latent diffusion model on brain representations.<n>Our results highlight the potential of end-to-end brain-to-image decoding and establish a path toward interpreting diffusion models through the lens of visual neuroscience.
arXiv Detail & Related papers (2025-09-28T01:55:55Z) - MindAligner: Explicit Brain Functional Alignment for Cross-Subject Visual Decoding from Limited fMRI Data [64.92867794764247]
MindAligner is a framework for cross-subject brain decoding from limited fMRI data.<n>Brain Transfer Matrix (BTM) projects the brain signals of an arbitrary new subject to one of the known subjects.<n>Brain Functional Alignment module is proposed to perform soft cross-subject brain alignment under different visual stimuli.
arXiv Detail & Related papers (2025-02-07T16:01:59Z) - Human-Aligned Image Models Improve Visual Decoding from the Brain [16.184884942703466]
We introduce the use of human-aligned image encoders to map brain signals to images.<n>Our empirical results support this hypothesis, demonstrating that this simple modification improves image retrieval accuracy by up to 21%.
arXiv Detail & Related papers (2025-02-05T11:14:51Z) - Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models [10.615012396285337]
We develop algorithms to enhance our understanding of visual processes by incorporating whole-brain activation maps.
We first compare our method with state-of-the-art approaches to decoding visual processing and show improved predictive semantic accuracy by 43%.
arXiv Detail & Related papers (2024-11-11T16:51:17Z) - MLIP: Enhancing Medical Visual Representation with Divergence Encoder
and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning.
Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge.
Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z) - Controllable Mind Visual Diffusion Model [58.83896307930354]
Brain signal visualization has emerged as an active research area, serving as a critical interface between the human visual system and computer vision models.
We propose a novel approach, referred to as Controllable Mind Visual Model Diffusion (CMVDM)
CMVDM extracts semantic and silhouette information from fMRI data using attribute alignment and assistant networks.
We then leverage a control model to fully exploit the extracted information for image synthesis, resulting in generated images that closely resemble the visual stimuli in terms of semantics and silhouette.
arXiv Detail & Related papers (2023-05-17T11:36:40Z) - Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding.
Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z) - BI AVAN: Brain inspired Adversarial Visual Attention Network [67.05560966998559]
We propose a brain-inspired adversarial visual attention network (BI-AVAN) to characterize human visual attention directly from functional brain activity.
Our model imitates the biased competition process between attention-related/neglected objects to identify and locate the visual objects in a movie frame the human brain focuses on in an unsupervised manner.
arXiv Detail & Related papers (2022-10-27T22:20:36Z) - Decoding Visual Neural Representations by Multimodal Learning of
Brain-Visual-Linguistic Features [9.783560855840602]
This paper presents a generic neural decoding method called BraVL that uses multimodal learning of brain-visual-linguistic features.
We focus on modeling the relationships between brain, visual and linguistic features via multimodal deep generative models.
In particular, our BraVL model can be trained under various semi-supervised scenarios to incorporate the visual and textual features obtained from the extra categories.
arXiv Detail & Related papers (2022-10-13T05:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.