Related papers: CATVis: Context-Aware Thought Visualization

CATVis: Context-Aware Thought Visualization

URL: http://arxiv.org/abs/2507.11522v1
Date: Tue, 15 Jul 2025 17:47:01 GMT
Title: CATVis: Context-Aware Thought Visualization
Authors: Tariq Mehmood, Hamza Ahmad, Muhammad Haroon Shakeel, Murtaza Taj,
Abstract summary: We propose a novel 5-stage framework for decoding visual representations from EEG signals.<n>We enable context-aware EEG-to-image generation through cross-modal alignment and re-ranking.<n> Experimental results demonstrate that our method generates high-quality images aligned with visual stimuli.
Score: 2.8298952038412706
License: http://creativecommons.org/licenses/by/4.0/
Abstract: EEG-based brain-computer interfaces (BCIs) have shown promise in various applications, such as motor imagery and cognitive state monitoring. However, decoding visual representations from EEG signals remains a significant challenge due to their complex and noisy nature. We thus propose a novel 5-stage framework for decoding visual representations from EEG signals: (1) an EEG encoder for concept classification, (2) cross-modal alignment of EEG and text embeddings in CLIP feature space, (3) caption refinement via re-ranking, (4) weighted interpolation of concept and caption embeddings for richer semantics, and (5) image generation using a pre-trained Stable Diffusion model. We enable context-aware EEG-to-image generation through cross-modal alignment and re-ranking. Experimental results demonstrate that our method generates high-quality images aligned with visual stimuli, outperforming SOTA approaches by 13.43% in Classification Accuracy, 15.21% in Generation Accuracy and reducing Fr\'echet Inception Distance by 36.61%, indicating superior semantic alignment and image quality.

Related papers

NeuroCLIP: Brain-Inspired Prompt Tuning for EEG-to-Image Multimodal Contrastive Learning [13.254096454986318]
We present NeuroCLIP, a prompt tuning framework tailored for EEG-to-image contrastive learning.<n>We are the first to introduce visual prompt tokens into EEG-image alignment, acting as global, modality-level prompts.<n>On the THINGS-EEG2 dataset, NeuroCLIP achieves a Top-1 accuracy of 63.2% in zero-shot image retrieval.
arXiv Detail & Related papers (2025-11-12T12:13:24Z)
Interleaving Reasoning for Better Text-to-Image Generation [83.69082794730664]
We introduce Interleaving Reasoning Generation (IRG), a framework that alternates between text-based thinking and image synthesis.<n>To train IRG effectively, we propose Interleaving Reasoning Generation Learning (IRGL), which targets two sub-goals.<n>Experiments show SoTA performance, yielding absolute gains of 5-10 points on GenEval, WISE, TIIF, GenAI-Bench, and OneIG-EN.
arXiv Detail & Related papers (2025-09-08T17:56:23Z)
Perception-oriented Bidirectional Attention Network for Image Super-resolution Quality Assessment [61.27648290203618]
We propose the Perception-oriented Bidirectional Attention Network (PBAN) for image SR FR-IQA.<n>PBAN is composed of three modules: an image encoder module, a perception-oriented bidirectional attention (PBA) module, and a quality prediction module.<n>Our proposed PBAN outperforms state-of-the-art quality assessment methods.
arXiv Detail & Related papers (2025-09-08T08:39:45Z)
ViEEG: Hierarchical Neural Coding with Cross-Modal Progressive Enhancement for EEG-Based Visual Decoding [14.18190036916225]
ViEEG is a biologically inspired hierarchical EEG decoding framework that aligns with the Hubel-Wiesel theory of visual processing.<n>Our framework achieves state-of-the-art performance, with 40.9% Top-1 accuracy in subject-dependent and 22.9% Top-1 accuracy in cross-subject settings, surpassing existing methods by over 45%.
arXiv Detail & Related papers (2025-05-18T13:19:08Z)
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation [53.01486796503091]
We present emphHarmon, a unified autoregressive framework that harmonizes understanding and generation tasks with a shared MAR encoder.<n>Harmon achieves state-of-the-art image generation results on the GenEval, MJHQ30K and WISE benchmarks.
arXiv Detail & Related papers (2025-03-27T20:50:38Z)
CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information [61.1904164368732]
We propose CognitionCapturer, a unified framework that fully leverages multimodal data to represent EEG signals.<n>Specifically, CognitionCapturer trains Modality Experts for each modality to extract cross-modal information from the EEG modality.<n>The framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities.
arXiv Detail & Related papers (2024-12-13T16:27:54Z)
UNIT: Unifying Image and Text Recognition in One Vision Encoder [51.140564856352825]
UNIT is a novel training framework aimed at UNifying Image and Text recognition within a single model. We show that UNIT significantly outperforms existing methods on document-related tasks. Notably, UNIT retains the original vision encoder architecture, making it cost-free in terms of inference and deployment.
arXiv Detail & Related papers (2024-09-06T08:02:43Z)
Mind's Eye: Image Recognition by EEG via Multimodal Similarity-Keeping Contrastive Learning [2.087148326341881]
This paper introduces a MUltimodal Similarity-keeping contrastivE learning framework for zero-shot EEG-based image classification. We develop a series of multivariate time-series encoders tailored for EEG signals and assess the efficacy of regularized contrastive EEG-Image pretraining. Our method achieves state-of-the-art performance, with a top-1 accuracy of 19.3% and a top-5 accuracy of 48.8% in 200-way zero-shot image classification.
arXiv Detail & Related papers (2024-06-05T16:42:23Z)
Alleviating Catastrophic Forgetting in Facial Expression Recognition with Emotion-Centered Models [49.3179290313959]
The proposed method, emotion-centered generative replay (ECgr), tackles this challenge by integrating synthetic images from generative adversarial networks. ECgr incorporates a quality assurance algorithm to ensure the fidelity of generated images. The experimental results on four diverse facial expression datasets demonstrate that incorporating images generated by our pseudo-rehearsal method enhances training on the targeted dataset and the source dataset.
arXiv Detail & Related papers (2024-04-18T15:28:34Z)
Enhancing EEG-to-Text Decoding through Transferable Representations from Pre-trained Contrastive EEG-Text Masked Autoencoder [69.7813498468116]
We propose Contrastive EEG-Text Masked Autoencoder (CET-MAE), a novel model that orchestrates compound self-supervised learning across and within EEG and text. We also develop a framework called E2T-PTR (EEG-to-Text decoding using Pretrained Transferable Representations) to decode text from EEG sequences.
arXiv Detail & Related papers (2024-02-27T11:45:21Z)
Learning Robust Deep Visual Representations from EEG Brain Recordings [13.768240137063428]
This study proposes a two-stage method where the first step is to obtain EEG-derived features for robust learning of deep representations. We demonstrate the generalizability of our feature extraction pipeline across three different datasets using deep-learning architectures. We propose a novel framework to transform unseen images into the EEG space and reconstruct them with approximation.
arXiv Detail & Related papers (2023-10-25T10:26:07Z)
Decoding Natural Images from EEG for Object Recognition [8.411976038504589]
This paper presents a self-supervised framework to demonstrate the feasibility of learning image representations from EEG signals. We achieve a top-1 accuracy of 15.6% and a top-5 accuracy of 42.8% in challenging 200-way zero-shot tasks. These findings yield valuable insights for neural decoding and brain-computer interfaces in real-world scenarios.
arXiv Detail & Related papers (2023-08-25T08:05:37Z)
DreamDiffusion: Generating High-Quality Images from Brain EEG Signals [42.30835251506628]
DreamDiffusion is a novel method for generating high-quality images directly from brain electroencephalogram (EEG) signals. The proposed method overcomes the challenges of using EEG signals for image generation, such as noise, limited information, and individual differences.
arXiv Detail & Related papers (2023-06-29T13:33:02Z)
Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding. Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z)
IMAGINE: Image Synthesis by Image-Guided Model Inversion [79.4691654458141]
We introduce an inversion based method, denoted as IMAge-Guided model INvErsion (IMAGINE), to generate high-quality and diverse images. We leverage the knowledge of image semantics from a pre-trained classifier to achieve plausible generations. IMAGINE enables the synthesis procedure to simultaneously 1) enforce semantic specificity constraints during the synthesis, 2) produce realistic images without generator training, and 3) give users intuitive control over the generation process.
arXiv Detail & Related papers (2021-04-13T02:00:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.