SYNAPSE: Synergizing an Adapter and Finetuning for High-Fidelity EEG Synthesis from a CLIP-Aligned Encoder
- URL: http://arxiv.org/abs/2511.17547v1
- Date: Tue, 11 Nov 2025 02:53:49 GMT
- Title: SYNAPSE: Synergizing an Adapter and Finetuning for High-Fidelity EEG Synthesis from a CLIP-Aligned Encoder
- Authors: Jeyoung Lee, Hochul Kang,
- Abstract summary: SYNAPSE is a two-stage framework that bridges EEG signal representation learning and high-fidelity image synthesis.<n>Our method achieves a semantically coherent latent space and state-of-the-art perceptual fidelity on the CVPR40 dataset.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent progress in diffusion-based generative models has enabled high-quality image synthesis conditioned on diverse modalities. Extending such models to brain signals could deepen our understanding of human perception and mental representations. However,electroencephalography (EEG) presents major challenges for image generation due to high noise, low spatial resolution, and strong inter-subject variability. Existing approaches,such as DreamDiffusion, BrainVis, and GWIT, primarily adapt EEG features to pre-trained Stable Diffusion models using complex alignment or classification pipelines, often resulting in large parameter counts and limited interpretability. We introduce SYNAPSE, a two-stage framework that bridges EEG signal representation learning and high-fidelity image synthesis. In Stage1, a CLIP-aligned EEG autoencoder learns a semantically structured latent representation by combining signal reconstruction and cross-modal alignment objectives. In Stage2, the pretrained encoder is frozen and integrated with a lightweight adaptation of Stable Diffusion, enabling efficient conditioning on EEG features with minimal trainable parameters. Our method achieves a semantically coherent latent space and state-of-the-art perceptual fidelity on the CVPR40 dataset, outperforming prior EEG-to-image models in both reconstruction efficiency and image quality. Quantitative and qualitative analyses demonstrate that SYNAPSE generalizes effectively across subjects, preserving visual semantics even when class-level agreement is reduced. These results suggest that reconstructing what the brain perceives, rather than what it classifies, is key to faithful EEG-based image generation.
Related papers
- Autoregressive Visual Decoding from EEG Signals [14.213172378363216]
We present AVDE, a lightweight and efficient framework for visual decoding from EEG signals.<n>We adopt an autoregressive generative framework based on a "next-scale prediction" strategy.<n> Experiments on two datasets show that AVDE outperforms previous state-of-the-art methods in both image retrieval and reconstruction tasks.
arXiv Detail & Related papers (2026-02-26T02:49:04Z) - Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders [74.72147962028265]
Representation Autoencoders (RAEs) have shown distinct advantages in diffusion modeling on ImageNet.<n>We investigate whether this framework can scale to large-scale, freeform text-to-image (T2I) generation.
arXiv Detail & Related papers (2026-01-22T18:58:16Z) - NeuroCLIP: Brain-Inspired Prompt Tuning for EEG-to-Image Multimodal Contrastive Learning [13.254096454986318]
We present NeuroCLIP, a prompt tuning framework tailored for EEG-to-image contrastive learning.<n>We are the first to introduce visual prompt tokens into EEG-image alignment, acting as global, modality-level prompts.<n>On the THINGS-EEG2 dataset, NeuroCLIP achieves a Top-1 accuracy of 63.2% in zero-shot image retrieval.
arXiv Detail & Related papers (2025-11-12T12:13:24Z) - EEG-Driven Image Reconstruction with Saliency-Guided Diffusion Models [4.815274507478168]
Existing EEG-driven image reconstruction methods often overlook spatial attention mechanisms, limiting fidelity and semantic coherence.<n>We propose a dual-conditioning framework that combines EEG embeddings with spatial saliency maps to enhance image generation.
arXiv Detail & Related papers (2025-10-30T11:34:37Z) - EchoGen: Generating Visual Echoes in Any Scene via Feed-Forward Subject-Driven Auto-Regressive Model [56.53617289548353]
EchoGen is a pioneering framework that empowers Visual Auto-Regressive ( VAR) models with subject-driven generation capabilities.<n>We employ a semantic encoder to extract the subject's abstract identity, which is injected through decoupled cross-attention to guide the overall composition.<n>To the best of our knowledge, EchoGen is the first feed-forward subject-driven framework built upon VAR models.
arXiv Detail & Related papers (2025-09-30T11:45:48Z) - Image-to-Brain Signal Generation for Visual Prosthesis with CLIP Guided Multimodal Diffusion Models [6.761875482596085]
We present the first image-to-brain signal framework that generates M/EEG from images.<n>The proposed framework comprises two key components: a pretrained CLIP visual encoder and a cross-attention enhanced U-Net diffusion model.<n>Unlike conventional generative models that rely on simple concatenation for conditioning, our cross-attention modules capture the complex interplay between visual features and brain signal representations.
arXiv Detail & Related papers (2025-08-31T10:29:58Z) - Category-aware EEG image generation based on wavelet transform and contrast semantic loss [4.165508411354963]
We propose a transformer-based EEG signal encoder integrating the Discrete Wavelet Transform (DWT) and the gating mechanism.<n> Guided by the feature alignment and category-aware fusion losses, this encoder is used to extract features related to visual stimuli from EEG signals.<n>With the aid of a pre-trained diffusion model, these features are reconstructed into visual stimuli.
arXiv Detail & Related papers (2025-05-30T07:24:58Z) - CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information [61.1904164368732]
We propose CognitionCapturer, a unified framework that fully leverages multimodal data to represent EEG signals.<n>Specifically, CognitionCapturer trains Modality Experts for each modality to extract cross-modal information from the EEG modality.<n>The framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities.
arXiv Detail & Related papers (2024-12-13T16:27:54Z) - Deformation-aware GAN for Medical Image Synthesis with Substantially Misaligned Pairs [0.0]
We propose a novel Deformation-aware GAN (DA-GAN) to dynamically correct the misalignment during the image synthesis based on inverse consistency.
Experimental results show that DA-GAN achieved superior performance on a public dataset with simulated misalignments and a real-world lung MRI-CT dataset with respiratory motion misalignment.
arXiv Detail & Related papers (2024-08-18T10:29:35Z) - CSLP-AE: A Contrastive Split-Latent Permutation Autoencoder Framework
for Zero-Shot Electroencephalography Signal Conversion [49.1574468325115]
A key aim in EEG analysis is to extract the underlying neural activation (content) as well as to account for the individual subject variability (style)
Inspired by recent advancements in voice conversion technologies, we propose a novel contrastive split-latent permutation autoencoder (CSLP-AE) framework that directly optimize for EEG conversion.
arXiv Detail & Related papers (2023-11-13T22:46:43Z) - You Only Train Once: A Unified Framework for Both Full-Reference and No-Reference Image Quality Assessment [45.62136459502005]
We propose a network to perform full reference (FR) and no reference (NR) IQA.
We first employ an encoder to extract multi-level features from input images.
A Hierarchical Attention (HA) module is proposed as a universal adapter for both FR and NR inputs.
A Semantic Distortion Aware (SDA) module is proposed to examine feature correlations between shallow and deep layers of the encoder.
arXiv Detail & Related papers (2023-10-14T11:03:04Z) - Identity-Aware CycleGAN for Face Photo-Sketch Synthesis and Recognition [61.87842307164351]
We first propose an Identity-Aware CycleGAN (IACycleGAN) model that applies a new perceptual loss to supervise the image generation network.
It improves CycleGAN on photo-sketch synthesis by paying more attention to the synthesis of key facial regions, such as eyes and nose.
We develop a mutual optimization procedure between the synthesis model and the recognition model, which iteratively synthesizes better images by IACycleGAN.
arXiv Detail & Related papers (2021-03-30T01:30:08Z) - High-Fidelity Synthesis with Disentangled Representation [60.19657080953252]
We propose an Information-Distillation Generative Adrial Network (ID-GAN) for disentanglement learning and high-fidelity synthesis.
Our method learns disentangled representation using VAE-based models, and distills the learned representation with an additional nuisance variable to the separate GAN-based generator for high-fidelity synthesis.
Despite the simplicity, we show that the proposed method is highly effective, achieving comparable image generation quality to the state-of-the-art methods using the disentangled representation.
arXiv Detail & Related papers (2020-01-13T14:39:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.