EEG-Driven Image Reconstruction with Saliency-Guided Diffusion Models
- URL: http://arxiv.org/abs/2510.26391v1
- Date: Thu, 30 Oct 2025 11:34:37 GMT
- Title: EEG-Driven Image Reconstruction with Saliency-Guided Diffusion Models
- Authors: Igor Abramov, Ilya Makarov,
- Abstract summary: Existing EEG-driven image reconstruction methods often overlook spatial attention mechanisms, limiting fidelity and semantic coherence.<n>We propose a dual-conditioning framework that combines EEG embeddings with spatial saliency maps to enhance image generation.
- Score: 4.815274507478168
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing EEG-driven image reconstruction methods often overlook spatial attention mechanisms, limiting fidelity and semantic coherence. To address this, we propose a dual-conditioning framework that combines EEG embeddings with spatial saliency maps to enhance image generation. Our approach leverages the Adaptive Thinking Mapper (ATM) for EEG feature extraction and fine-tunes Stable Diffusion 2.1 via Low-Rank Adaptation (LoRA) to align neural signals with visual semantics, while a ControlNet branch conditions generation on saliency maps for spatial control. Evaluated on THINGS-EEG, our method achieves a significant improvement in the quality of low- and high-level image features over existing approaches. Simultaneously, strongly aligning with human visual attention. The results demonstrate that attentional priors resolve EEG ambiguities, enabling high-fidelity reconstructions with applications in medical diagnostics and neuroadaptive interfaces, advancing neural decoding through efficient adaptation of pre-trained diffusion models.
Related papers
- Autoregressive Visual Decoding from EEG Signals [14.213172378363216]
We present AVDE, a lightweight and efficient framework for visual decoding from EEG signals.<n>We adopt an autoregressive generative framework based on a "next-scale prediction" strategy.<n> Experiments on two datasets show that AVDE outperforms previous state-of-the-art methods in both image retrieval and reconstruction tasks.
arXiv Detail & Related papers (2026-02-26T02:49:04Z) - Geometry- and Relation-Aware Diffusion for EEG Super-Resolution [33.53397341962788]
TopoDiff is a geometry- and relation-aware diffusion model for EEG spatial super-resolution.<n>Inspired by how human experts interpret spatial EEG patterns, TopoDiff incorporates topology-aware image embeddings.<n>This design yields a spatially grounded EEG spatial super-resolution framework with consistent performance improvements.
arXiv Detail & Related papers (2026-02-02T15:44:20Z) - NeuroCLIP: Brain-Inspired Prompt Tuning for EEG-to-Image Multimodal Contrastive Learning [13.254096454986318]
We present NeuroCLIP, a prompt tuning framework tailored for EEG-to-image contrastive learning.<n>We are the first to introduce visual prompt tokens into EEG-image alignment, acting as global, modality-level prompts.<n>On the THINGS-EEG2 dataset, NeuroCLIP achieves a Top-1 accuracy of 63.2% in zero-shot image retrieval.
arXiv Detail & Related papers (2025-11-12T12:13:24Z) - SYNAPSE: Synergizing an Adapter and Finetuning for High-Fidelity EEG Synthesis from a CLIP-Aligned Encoder [0.0]
SYNAPSE is a two-stage framework that bridges EEG signal representation learning and high-fidelity image synthesis.<n>Our method achieves a semantically coherent latent space and state-of-the-art perceptual fidelity on the CVPR40 dataset.
arXiv Detail & Related papers (2025-11-11T02:53:49Z) - Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction [65.67001243986981]
We propose MindHier, a coarse-to-fine fMRI-to-image reconstruction framework built on scale-wise autoregressive modeling.<n>MindHier achieves superior semantic fidelity, 4.67x faster inference, and more deterministic results than the diffusion-based baselines.
arXiv Detail & Related papers (2025-10-25T15:40:07Z) - Image-to-Brain Signal Generation for Visual Prosthesis with CLIP Guided Multimodal Diffusion Models [6.761875482596085]
We present the first image-to-brain signal framework that generates M/EEG from images.<n>The proposed framework comprises two key components: a pretrained CLIP visual encoder and a cross-attention enhanced U-Net diffusion model.<n>Unlike conventional generative models that rely on simple concatenation for conditioning, our cross-attention modules capture the complex interplay between visual features and brain signal representations.
arXiv Detail & Related papers (2025-08-31T10:29:58Z) - CodeBrain: Towards Decoupled Interpretability and Multi-Scale Architecture for EEG Foundation Model [52.466542039411515]
EEG foundation models (EFMs) have emerged to address the scalability issues of task-specific models.<n>We present CodeBrain, a two-stage EFM designed to fill this gap.<n>In the first stage, we introduce the TFDual-Tokenizer, which decouples heterogeneous temporal and frequency EEG signals into discrete tokens.<n>In the second stage, we propose the multi-scale EEGSSM architecture, which combines structured global convolution with sliding window attention.
arXiv Detail & Related papers (2025-06-10T17:20:39Z) - Category-aware EEG image generation based on wavelet transform and contrast semantic loss [4.165508411354963]
We propose a transformer-based EEG signal encoder integrating the Discrete Wavelet Transform (DWT) and the gating mechanism.<n> Guided by the feature alignment and category-aware fusion losses, this encoder is used to extract features related to visual stimuli from EEG signals.<n>With the aid of a pre-trained diffusion model, these features are reconstructed into visual stimuli.
arXiv Detail & Related papers (2025-05-30T07:24:58Z) - X$^{2}$-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction [64.2059940799033]
Current methods discretize temporal resolution into fixed phases with respiratory gating devices.<n>X$2$-Gaussian, a novel framework, enables continuous-time 4DCT reconstruction by integrating dynamic radiative splatting with self-supervised respiratory motion learning.
arXiv Detail & Related papers (2025-03-27T17:59:57Z) - Guess What I Think: Streamlined EEG-to-Image Generation with Latent Diffusion Models [4.933734706786783]
EEG is a low-cost, non-invasive, and portable neuroimaging technique.<n>EEG presents inherent challenges due to its low spatial resolution and susceptibility to noise and artifacts.<n>We propose a framework based on the ControlNet adapter for conditioning a latent diffusion model through EEG signals.
arXiv Detail & Related papers (2024-09-17T19:07:13Z) - Deformation-aware GAN for Medical Image Synthesis with Substantially Misaligned Pairs [0.0]
We propose a novel Deformation-aware GAN (DA-GAN) to dynamically correct the misalignment during the image synthesis based on inverse consistency.
Experimental results show that DA-GAN achieved superior performance on a public dataset with simulated misalignments and a real-world lung MRI-CT dataset with respiratory motion misalignment.
arXiv Detail & Related papers (2024-08-18T10:29:35Z) - StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model [62.25424831998405]
StealthDiffusion is a framework that modifies AI-generated images into high-quality, imperceptible adversarial examples.
It is effective in both white-box and black-box settings, transforming AI-generated images into high-quality adversarial forgeries.
arXiv Detail & Related papers (2024-08-11T01:22:29Z) - You Only Train Once: A Unified Framework for Both Full-Reference and No-Reference Image Quality Assessment [45.62136459502005]
We propose a network to perform full reference (FR) and no reference (NR) IQA.
We first employ an encoder to extract multi-level features from input images.
A Hierarchical Attention (HA) module is proposed as a universal adapter for both FR and NR inputs.
A Semantic Distortion Aware (SDA) module is proposed to examine feature correlations between shallow and deep layers of the encoder.
arXiv Detail & Related papers (2023-10-14T11:03:04Z) - Learning Structure-Guided Diffusion Model for 2D Human Pose Estimation [71.24808323646167]
We propose textbfDiffusionPose, a new scheme for learning keypoints heatmaps by a neural network.
During training, the keypoints are diffused to random distribution by adding noises and the diffusion model learns to recover ground-truth heatmaps from noised heatmaps.
Experiments show the prowess of our scheme with improvements of 1.6, 1.2, and 1.2 mAP on widely-used COCO, CrowdPose, and AI Challenge datasets.
arXiv Detail & Related papers (2023-06-29T16:24:32Z) - Unsupervised Bidirectional Cross-Modality Adaptation via Deeply
Synergistic Image and Feature Alignment for Medical Image Segmentation [73.84166499988443]
We present a novel unsupervised domain adaptation framework, named as Synergistic Image and Feature Alignment (SIFA)
Our proposed SIFA conducts synergistic alignment of domains from both image and feature perspectives.
Experimental results on two different tasks demonstrate that our SIFA method is effective in improving segmentation performance on unlabeled target images.
arXiv Detail & Related papers (2020-02-06T13:49:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.