NeuroCLIP: Brain-Inspired Prompt Tuning for EEG-to-Image Multimodal Contrastive Learning
- URL: http://arxiv.org/abs/2511.09250v1
- Date: Thu, 13 Nov 2025 01:42:42 GMT
- Title: NeuroCLIP: Brain-Inspired Prompt Tuning for EEG-to-Image Multimodal Contrastive Learning
- Authors: Jiyuan Wang, Li Zhang, Haipeng Lin, Qile Liu, Gan Huang, Ziyu Li, Zhen Liang, Xia Wu,
- Abstract summary: We present NeuroCLIP, a prompt tuning framework tailored for EEG-to-image contrastive learning.<n>We are the first to introduce visual prompt tokens into EEG-image alignment, acting as global, modality-level prompts.<n>On the THINGS-EEG2 dataset, NeuroCLIP achieves a Top-1 accuracy of 63.2% in zero-shot image retrieval.
- Score: 13.254096454986318
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in brain-inspired artificial intelligence have sought to align neural signals with visual semantics using multimodal models such as CLIP. However, existing methods often treat CLIP as a static feature extractor, overlooking its adaptability to neural representations and the inherent physiological-symbolic gap in EEG-image alignment. To address these challenges, we present NeuroCLIP, a prompt tuning framework tailored for EEG-to-image contrastive learning. Our approach introduces three core innovations: (1) We design a dual-stream visual embedding pipeline that combines dynamic filtering and token-level fusion to generate instance-level adaptive prompts, which guide the adjustment of patch embedding tokens based on image content, thereby enabling fine-grained modulation of visual representations under neural constraints; (2) We are the first to introduce visual prompt tokens into EEG-image alignment, acting as global, modality- level prompts that work in conjunction with instance-level adjustments. These visual prompt tokens are inserted into the Transformer architecture to facilitate neural-aware adaptation and parameter optimization at a global level; (3) Inspired by neuroscientific principles of human visual encoding, we propose a refined contrastive loss that better model the semantic ambiguity and cross-modal noise present in EEG signals. On the THINGS-EEG2 dataset, NeuroCLIP achieves a Top-1 accuracy of 63.2% in zero-shot image retrieval, surpassing the previous best method by +12.3%, and demonstrates strong generalization under inter-subject conditions (+4.6% Top-1), highlighting the potential of physiology-aware prompt tuning for bridging brain signals and visual semantics.
Related papers
- Uni-Neur2Img: Unified Neural Signal-Guided Image Generation, Editing, and Stylization via Diffusion Transformers [6.311754400490674]
We present Uni-Neur2Img, a unified framework for neural signal-driven image generation and editing.<n>Uni-Neur2Img offers a unified, efficient, and efficient solution for bridging neural signals and visual content generation.
arXiv Detail & Related papers (2025-12-21T08:12:38Z) - SYNAPSE: Synergizing an Adapter and Finetuning for High-Fidelity EEG Synthesis from a CLIP-Aligned Encoder [0.0]
SYNAPSE is a two-stage framework that bridges EEG signal representation learning and high-fidelity image synthesis.<n>Our method achieves a semantically coherent latent space and state-of-the-art perceptual fidelity on the CVPR40 dataset.
arXiv Detail & Related papers (2025-11-11T02:53:49Z) - Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction [65.67001243986981]
We propose MindHier, a coarse-to-fine fMRI-to-image reconstruction framework built on scale-wise autoregressive modeling.<n>MindHier achieves superior semantic fidelity, 4.67x faster inference, and more deterministic results than the diffusion-based baselines.
arXiv Detail & Related papers (2025-10-25T15:40:07Z) - NeuroRVQ: Multi-Scale EEG Tokenization for Generative Large Brainwave Models [66.91449452840318]
We introduce NeuroRVQ, a scalable Large Brainwave Model (LBM) centered on a codebook-based tokenizer.<n>Our tokenizer integrates: (i) multi-scale feature extraction modules that capture the full frequency neural spectrum; (ii) hierarchical residual vector quantization (RVQ) codebooks for high-resolution encoding; and, (iii) an EEG signal phase- and amplitude-aware loss function for efficient training.<n>Our empirical results demonstrate that NeuroRVQ achieves lower reconstruction error and outperforms existing LBMs on a variety of downstream tasks.
arXiv Detail & Related papers (2025-10-15T01:26:52Z) - Spatial-Functional awareness Transformer-based graph archetype contrastive learning for Decoding Visual Neural Representations from EEG [3.661246946935037]
We propose a Spatial-Functional Awareness Transformer-based Graph Archetype Contrastive Learning (SFTG) framework to enhance EEG-based visual decoding.<n>Specifically, we introduce the EEG Graph Transformer (EGT), a novel graph-based neural architecture that simultaneously encodes spatial brain connectivity and temporal neural dynamics.<n>To mitigate high intra-subject variability, we propose Graph Archetype Contrastive Learning (GAC), which learns subject-specific EEG graph archetypes to improve feature consistency and class separability.
arXiv Detail & Related papers (2025-09-29T13:27:55Z) - SynBrain: Enhancing Visual-to-fMRI Synthesis via Probabilistic Representation Learning [54.390403684665834]
Deciphering how visual stimuli are transformed into cortical responses is a fundamental challenge in computational neuroscience.<n>We propose SynBrain, a generative framework that simulates the transformation from visual semantics to neural responses in a probabilistic and biologically interpretable manner.<n> Experimental results demonstrate that SynBrain surpasses state-of-the-art methods in subject-specific visual-to-fMRI encoding performance.
arXiv Detail & Related papers (2025-08-14T03:01:05Z) - ViEEG: Hierarchical Visual Neural Representation for EEG Brain Decoding [18.51835182602402]
ViEEG is a neuro-We further adopt hierarchical contrastive learning for EEG-CLIP representation alignment, enabling zero-shot object recognition.<n>Our framework sets a new paradigm for EEG brain decoding.<n>ViEEG decomposes each visual stimulus into three biologically aligned components-contour, foreground object, and contextual scene-serving anchors for a three-stream EEG encoder.
arXiv Detail & Related papers (2025-05-18T13:19:08Z) - Neural-MCRL: Neural Multimodal Contrastive Representation Learning for EEG-based Visual Decoding [2.587640069216139]
Decoding neural visual representations from electroencephalogram (EEG)-based brain activity is crucial for advancing brain-machine interfaces (BMI)<n>Existing methods often overlook semantic consistency and completeness within modalities and lack effective semantic alignment across modalities.<n>We propose Neural-MCRL, a novel framework that achieves multimodal alignment through semantic bridging and cross-attention mechanisms.
arXiv Detail & Related papers (2024-12-23T07:02:44Z) - MindFormer: Semantic Alignment of Multi-Subject fMRI for Brain Decoding [50.55024115943266]
We introduce a novel semantic alignment method of multi-subject fMRI signals using so-called MindFormer.
This model is specifically designed to generate fMRI-conditioned feature vectors that can be used for conditioning Stable Diffusion model for fMRI- to-image generation or large language model (LLM) for fMRI-to-text generation.
Our experimental results demonstrate that MindFormer generates semantically consistent images and text across different subjects.
arXiv Detail & Related papers (2024-05-28T00:36:25Z) - Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding.
Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z) - BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP
for Generic Natural Visual Stimulus Decoding [51.911473457195555]
BrainCLIP is a task-agnostic fMRI-based brain decoding model.
It bridges the modality gap between brain activity, image, and text.
BrainCLIP can reconstruct visual stimuli with high semantic fidelity.
arXiv Detail & Related papers (2023-02-25T03:28:54Z) - IMAGINE: Image Synthesis by Image-Guided Model Inversion [79.4691654458141]
We introduce an inversion based method, denoted as IMAge-Guided model INvErsion (IMAGINE), to generate high-quality and diverse images.
We leverage the knowledge of image semantics from a pre-trained classifier to achieve plausible generations.
IMAGINE enables the synthesis procedure to simultaneously 1) enforce semantic specificity constraints during the synthesis, 2) produce realistic images without generator training, and 3) give users intuitive control over the generation process.
arXiv Detail & Related papers (2021-04-13T02:00:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.