See What You See: Self-supervised Cross-modal Retrieval of Visual
Stimuli from Brain Activity
- URL: http://arxiv.org/abs/2208.03666v3
- Date: Thu, 11 Aug 2022 01:19:39 GMT
- Title: See What You See: Self-supervised Cross-modal Retrieval of Visual
Stimuli from Brain Activity
- Authors: Zesheng Ye, Lina Yao, Yu Zhang, Sylvia Gustin
- Abstract summary: We present a single-stage EEG-visual retrieval paradigm where data of two modalities are correlated, as opposed to their annotations.
We demonstrate the proposed approach completes an instance-level EEG-visual retrieval task which existing methods cannot.
- Score: 37.837710340954374
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent studies demonstrate the use of a two-stage supervised framework to
generate images that depict human perception to visual stimuli from EEG,
referring to EEG-visual reconstruction. They are, however, unable to reproduce
the exact visual stimulus, since it is the human-specified annotation of
images, not their data, that determines what the synthesized images are.
Moreover, synthesized images often suffer from noisy EEG encodings and unstable
training of generative models, making them hard to recognize. Instead, we
present a single-stage EEG-visual retrieval paradigm where data of two
modalities are correlated, as opposed to their annotations, allowing us to
recover the exact visual stimulus for an EEG clip. We maximize the mutual
information between the EEG encoding and associated visual stimulus through
optimization of a contrastive self-supervised objective, leading to two
additional benefits. One, it enables EEG encodings to handle visual classes
beyond seen ones during training, since learning is not directed at class
annotations. In addition, the model is no longer required to generate every
detail of the visual stimulus, but rather focuses on cross-modal alignment and
retrieves images at the instance level, ensuring distinguishable model output.
Empirical studies are conducted on the largest single-subject EEG dataset that
measures brain activities evoked by image stimuli. We demonstrate the proposed
approach completes an instance-level EEG-visual retrieval task which existing
methods cannot. We also examine the implications of a range of EEG and visual
encoder structures. Furthermore, for a mostly studied semantic-level EEG-visual
classification task, despite not using class annotations, the proposed method
outperforms state-of-the-art supervised EEG-visual reconstruction approaches,
particularly on the capability of open class recognition.
Related papers
- NECOMIMI: Neural-Cognitive Multimodal EEG-informed Image Generation with Diffusion Models [0.0]
NECOMIMI introduces a novel framework for generating images directly from EEG signals using advanced diffusion models.
The proposed NERV EEG encoder demonstrates state-of-the-art (SoTA) performance across multiple zero-shot classification tasks.
We introduce the CAT Score as a new metric tailored for EEG-to-image evaluation and establish a benchmark on the ThingsEEG dataset.
arXiv Detail & Related papers (2024-10-01T14:05:30Z) - Visual Neural Decoding via Improved Visual-EEG Semantic Consistency [3.4061238650474657]
Methods that directly map EEG features to the CLIP embedding space may introduce mapping bias and cause semantic inconsistency.
We propose a Visual-EEG Semantic Decouple Framework that explicitly extracts the semantic-related features of these two modalities to facilitate optimal alignment.
Our method achieves state-of-the-art results in zero-shot neural decoding tasks.
arXiv Detail & Related papers (2024-08-13T10:16:10Z) - BrainVis: Exploring the Bridge between Brain and Visual Signals via Image Reconstruction [7.512223286737468]
Analyzing and reconstructing visual stimuli from brain signals effectively advances the understanding of human visual system.
However, the EEG signals are complex and contain significant noise.
This leads to substantial limitations in existing works of visual stimuli reconstruction from EEG.
We propose a novel approach called BrainVis to address these challenges.
arXiv Detail & Related papers (2023-12-22T17:49:11Z) - Learning Robust Deep Visual Representations from EEG Brain Recordings [13.768240137063428]
This study proposes a two-stage method where the first step is to obtain EEG-derived features for robust learning of deep representations.
We demonstrate the generalizability of our feature extraction pipeline across three different datasets using deep-learning architectures.
We propose a novel framework to transform unseen images into the EEG space and reconstruct them with approximation.
arXiv Detail & Related papers (2023-10-25T10:26:07Z) - Seeing through the Brain: Image Reconstruction of Visual Perception from
Human Brain Signals [27.92796103924193]
We propose a comprehensive pipeline, named NeuroImagen, for reconstructing visual stimuli images from EEG signals.
We incorporate a novel multi-level perceptual information decoding to draw multi-grained outputs from the given EEG data.
arXiv Detail & Related papers (2023-07-27T12:54:16Z) - Controllable Mind Visual Diffusion Model [58.83896307930354]
Brain signal visualization has emerged as an active research area, serving as a critical interface between the human visual system and computer vision models.
We propose a novel approach, referred to as Controllable Mind Visual Model Diffusion (CMVDM)
CMVDM extracts semantic and silhouette information from fMRI data using attribute alignment and assistant networks.
We then leverage a control model to fully exploit the extracted information for image synthesis, resulting in generated images that closely resemble the visual stimuli in terms of semantics and silhouette.
arXiv Detail & Related papers (2023-05-17T11:36:40Z) - Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner.
Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z) - A Deep Learning Approach for the Segmentation of Electroencephalography
Data in Eye Tracking Applications [56.458448869572294]
We introduce DETRtime, a novel framework for time-series segmentation of EEG data.
Our end-to-end deep learning-based framework brings advances in Computer Vision to the forefront.
Our model generalizes well in the task of EEG sleep stage segmentation.
arXiv Detail & Related papers (2022-06-17T10:17:24Z) - Two-stage Visual Cues Enhancement Network for Referring Image
Segmentation [89.49412325699537]
Referring Image (RIS) aims at segmenting the target object from an image referred by one given natural language expression.
In this paper, we tackle this problem by devising a Two-stage Visual cues enhancement Network (TV-Net)
Through the two-stage enhancement, our proposed TV-Net enjoys better performances in learning fine-grained matching behaviors between the natural language expression and image.
arXiv Detail & Related papers (2021-10-09T02:53:39Z) - Joint Deep Learning of Facial Expression Synthesis and Recognition [97.19528464266824]
We propose a novel joint deep learning of facial expression synthesis and recognition method for effective FER.
The proposed method involves a two-stage learning procedure. Firstly, a facial expression synthesis generative adversarial network (FESGAN) is pre-trained to generate facial images with different facial expressions.
In order to alleviate the problem of data bias between the real images and the synthetic images, we propose an intra-class loss with a novel real data-guided back-propagation (RDBP) algorithm.
arXiv Detail & Related papers (2020-02-06T10:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.