Autoregressive Visual Decoding from EEG Signals
- URL: http://arxiv.org/abs/2602.22555v1
- Date: Thu, 26 Feb 2026 02:49:04 GMT
- Title: Autoregressive Visual Decoding from EEG Signals
- Authors: Sicheng Dai, Hongwang Xiao, Shan Yu, Qiwei Ye,
- Abstract summary: We present AVDE, a lightweight and efficient framework for visual decoding from EEG signals.<n>We adopt an autoregressive generative framework based on a "next-scale prediction" strategy.<n> Experiments on two datasets show that AVDE outperforms previous state-of-the-art methods in both image retrieval and reconstruction tasks.
- Score: 14.213172378363216
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Electroencephalogram (EEG) signals have become a popular medium for decoding visual information due to their cost-effectiveness and high temporal resolution. However, current approaches face significant challenges in bridging the modality gap between EEG and image data. These methods typically rely on complex adaptation processes involving multiple stages, making it hard to maintain consistency and manage compounding errors. Furthermore, the computational overhead imposed by large-scale diffusion models limit their practicality in real-world brain-computer interface (BCI) applications. In this work, we present AVDE, a lightweight and efficient framework for visual decoding from EEG signals. First, we leverage LaBraM, a pre-trained EEG model, and fine-tune it via contrastive learning to align EEG and image representations. Second, we adopt an autoregressive generative framework based on a "next-scale prediction" strategy: images are encoded into multi-scale token maps using a pre-trained VQ-VAE, and a transformer is trained to autoregressively predict finer-scale tokens starting from EEG embeddings as the coarsest representation. This design enables coherent generation while preserving a direct connection between the input EEG signals and the reconstructed images. Experiments on two datasets show that AVDE outperforms previous state-of-the-art methods in both image retrieval and reconstruction tasks, while using only 10% of the parameters. In addition, visualization of intermediate outputs shows that the generative process of AVDE reflects the hierarchical nature of human visual perception. These results highlight the potential of autoregressive models as efficient and interpretable tools for practical BCI applications.
Related papers
- Guiding Diffusion-based Reconstruction with Contrastive Signals for Balanced Visual Representation [81.40978077888693]
Contrastive Language-Image Pre-training (CLIP) has become a key bottleneck for downstream performance.<n>Recent solutions use diffusion models to enhance representations by conditioning image reconstruction on CLIP visual tokens.<n>We integrate contrastive signals into diffusion-based reconstruction to pursue more comprehensive visual representations.
arXiv Detail & Related papers (2026-03-05T04:45:49Z) - ResTok: Learning Hierarchical Residuals in 1D Visual Tokenizers for Autoregressive Image Generation [64.84095852784714]
Residual Tokenizer (ResTok) is a 1D visual tokenizer that builds hierarchical residuals for both image tokens and latent tokens.<n>We show that restoring hierarchical residual priors in visual tokenization significantly improves AR image generation, achieving a gFID of 2.34 on ImageNet-256 with only 9 sampling steps.
arXiv Detail & Related papers (2026-01-07T14:09:18Z) - SYNAPSE: Synergizing an Adapter and Finetuning for High-Fidelity EEG Synthesis from a CLIP-Aligned Encoder [0.0]
SYNAPSE is a two-stage framework that bridges EEG signal representation learning and high-fidelity image synthesis.<n>Our method achieves a semantically coherent latent space and state-of-the-art perceptual fidelity on the CVPR40 dataset.
arXiv Detail & Related papers (2025-11-11T02:53:49Z) - CRIA: A Cross-View Interaction and Instance-Adapted Pre-training Framework for Generalizable EEG Representations [52.251569042852815]
CRIA is an adaptive framework that utilizes variable-length and variable-channel coding to achieve a unified representation of EEG data across different datasets.<n>The model employs a cross-attention mechanism to fuse temporal, spectral, and spatial features effectively.<n> Experimental results on the Temple University EEG corpus and the CHB-MIT dataset show that CRIA outperforms existing methods with the same pre-training conditions.
arXiv Detail & Related papers (2025-06-19T06:31:08Z) - Category-aware EEG image generation based on wavelet transform and contrast semantic loss [4.165508411354963]
We propose a transformer-based EEG signal encoder integrating the Discrete Wavelet Transform (DWT) and the gating mechanism.<n> Guided by the feature alignment and category-aware fusion losses, this encoder is used to extract features related to visual stimuli from EEG signals.<n>With the aid of a pre-trained diffusion model, these features are reconstructed into visual stimuli.
arXiv Detail & Related papers (2025-05-30T07:24:58Z) - CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information [61.1904164368732]
We propose CognitionCapturer, a unified framework that fully leverages multimodal data to represent EEG signals.<n>Specifically, CognitionCapturer trains Modality Experts for each modality to extract cross-modal information from the EEG modality.<n>The framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities.
arXiv Detail & Related papers (2024-12-13T16:27:54Z) - Mind's Eye: Image Recognition by EEG via Multimodal Similarity-Keeping Contrastive Learning [2.087148326341881]
This paper introduces a MUltimodal Similarity-keeping contrastivE learning framework for zero-shot EEG-based image classification.
We develop a series of multivariate time-series encoders tailored for EEG signals and assess the efficacy of regularized contrastive EEG-Image pretraining.
Our method achieves state-of-the-art performance, with a top-1 accuracy of 19.3% and a top-5 accuracy of 48.8% in 200-way zero-shot image classification.
arXiv Detail & Related papers (2024-06-05T16:42:23Z) - RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection [60.960988614701414]
RIGID is a training-free and model-agnostic method for robust AI-generated image detection.
RIGID significantly outperforms existing trainingbased and training-free detectors.
arXiv Detail & Related papers (2024-05-30T14:49:54Z) - Learning Robust Deep Visual Representations from EEG Brain Recordings [13.768240137063428]
This study proposes a two-stage method where the first step is to obtain EEG-derived features for robust learning of deep representations.
We demonstrate the generalizability of our feature extraction pipeline across three different datasets using deep-learning architectures.
We propose a novel framework to transform unseen images into the EEG space and reconstruct them with approximation.
arXiv Detail & Related papers (2023-10-25T10:26:07Z) - DreamDiffusion: Generating High-Quality Images from Brain EEG Signals [42.30835251506628]
DreamDiffusion is a novel method for generating high-quality images directly from brain electroencephalogram (EEG) signals.
The proposed method overcomes the challenges of using EEG signals for image generation, such as noise, limited information, and individual differences.
arXiv Detail & Related papers (2023-06-29T13:33:02Z) - Joint Deep Learning of Facial Expression Synthesis and Recognition [97.19528464266824]
We propose a novel joint deep learning of facial expression synthesis and recognition method for effective FER.
The proposed method involves a two-stage learning procedure. Firstly, a facial expression synthesis generative adversarial network (FESGAN) is pre-trained to generate facial images with different facial expressions.
In order to alleviate the problem of data bias between the real images and the synthetic images, we propose an intra-class loss with a novel real data-guided back-propagation (RDBP) algorithm.
arXiv Detail & Related papers (2020-02-06T10:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.