Achieving Fine-grained Cross-modal Understanding through Brain-inspired Hierarchical Representation Learning
- URL: http://arxiv.org/abs/2601.01339v1
- Date: Sun, 04 Jan 2026 02:54:29 GMT
- Title: Achieving Fine-grained Cross-modal Understanding through Brain-inspired Hierarchical Representation Learning
- Authors: Weihang You, Hanqi Jiang, Yi Pan, Junhao Chen, Tianming Liu, Fei Dou,
- Abstract summary: We present NeuroAlign, a novel framework for fine-grained fMRI-video alignment inspired by the hierarchical organization of the human visual system.<n>Our framework implements a two-stage mechanism that mirrors biological visual pathways: global semantic understanding and fine-grained pattern matching.<n> Experiments demonstrate that NeuroAlign significantly outperforms existing methods in cross-modal retrieval tasks.
- Score: 20.602643947067406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding neural responses to visual stimuli remains challenging due to the inherent complexity of brain representations and the modality gap between neural data and visual inputs. Existing methods, mainly based on reducing neural decoding to generation tasks or simple correlations, fail to reflect the hierarchical and temporal processes of visual processing in the brain. To address these limitations, we present NeuroAlign, a novel framework for fine-grained fMRI-video alignment inspired by the hierarchical organization of the human visual system. Our framework implements a two-stage mechanism that mirrors biological visual pathways: global semantic understanding through Neural-Temporal Contrastive Learning (NTCL) and fine-grained pattern matching through enhanced vector quantization. NTCL explicitly models temporal dynamics through bidirectional prediction between modalities, while our DynaSyncMM-EMA approach enables dynamic multi-modal fusion with adaptive weighting. Experiments demonstrate that NeuroAlign significantly outperforms existing methods in cross-modal retrieval tasks, establishing a new paradigm for understanding visual cognitive mechanisms.
Related papers
- Uncovering Brain-Like Hierarchical Patterns in Vision-Language Models through fMRI-Based Neural Encoding [34.313883741642066]
Current understanding of the parallels between artificial neural networks (ANNs) and human brain processing remains limited.<n>We propose a novel neuron-level analysis framework that investigates the multimodal information processing mechanisms in vision-language models (VLMs) through the lens of human brain activity.
arXiv Detail & Related papers (2025-10-19T15:11:03Z) - SynBrain: Enhancing Visual-to-fMRI Synthesis via Probabilistic Representation Learning [54.390403684665834]
Deciphering how visual stimuli are transformed into cortical responses is a fundamental challenge in computational neuroscience.<n>We propose SynBrain, a generative framework that simulates the transformation from visual semantics to neural responses in a probabilistic and biologically interpretable manner.<n> Experimental results demonstrate that SynBrain surpasses state-of-the-art methods in subject-specific visual-to-fMRI encoding performance.
arXiv Detail & Related papers (2025-08-14T03:01:05Z) - BrainFLORA: Uncovering Brain Concept Representation via Multimodal Neural Embeddings [19.761793010311614]
We introduce BrainFLORA, a unified framework for integrating cross-modal neuroimaging data to construct a shared neural representation.<n>Our approach leverages multimodal large language models (MLLMs) augmented with modality-specific adapters and task decoders, achieving state-of-the-art performance in joint-subject visual retrieval.<n>BrainFLORA offers novel implications for cognitive neuroscience and brain-computer interfaces (BCIs)
arXiv Detail & Related papers (2025-07-13T18:56:17Z) - Brain2Text Decoding Model Reveals the Neural Mechanisms of Visual Semantic Processing [0.17188280334580194]
We present a novel framework that directly decodes fMRI signals into textual descriptions of viewed natural images.<n>Our novel deep learning model, trained without visual information, achieves state-of-the-art semantic decoding performance.
arXiv Detail & Related papers (2025-03-15T07:28:02Z) - Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities.<n>We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities.<n>We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z) - Neural-MCRL: Neural Multimodal Contrastive Representation Learning for EEG-based Visual Decoding [2.587640069216139]
Decoding neural visual representations from electroencephalogram (EEG)-based brain activity is crucial for advancing brain-machine interfaces (BMI)<n>Existing methods often overlook semantic consistency and completeness within modalities and lack effective semantic alignment across modalities.<n>We propose Neural-MCRL, a novel framework that achieves multimodal alignment through semantic bridging and cross-attention mechanisms.
arXiv Detail & Related papers (2024-12-23T07:02:44Z) - Interpretable Spatio-Temporal Embedding for Brain Structural-Effective Network with Ordinary Differential Equation [56.34634121544929]
In this study, we first construct the brain-effective network via the dynamic causal model.
We then introduce an interpretable graph learning framework termed Spatio-Temporal Embedding ODE (STE-ODE)
This framework incorporates specifically designed directed node embedding layers, aiming at capturing the dynamic interplay between structural and effective networks.
arXiv Detail & Related papers (2024-05-21T20:37:07Z) - Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks [59.38765771221084]
We present a physiologically inspired speech recognition architecture compatible and scalable with deep learning frameworks.
We show end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network.
Our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance.
arXiv Detail & Related papers (2024-04-22T09:40:07Z) - NeuroCine: Decoding Vivid Video Sequences from Human Brain Activties [23.893490180665996]
We introduce NeuroCine, a novel dual-phase framework to targeting the inherent challenges of decoding fMRI data.
tested on a publicly available fMRI dataset, our method shows promising results.
Our attention analysis suggests that the model aligns with existing brain structures and functions, indicating its biological plausibility and interpretability.
arXiv Detail & Related papers (2024-02-02T17:34:25Z) - Contrastive-Signal-Dependent Plasticity: Self-Supervised Learning in Spiking Neural Circuits [61.94533459151743]
This work addresses the challenge of designing neurobiologically-motivated schemes for adjusting the synapses of spiking networks.
Our experimental simulations demonstrate a consistent advantage over other biologically-plausible approaches when training recurrent spiking networks.
arXiv Detail & Related papers (2023-03-30T02:40:28Z) - Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding.
Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.