BrainVis: Exploring the Bridge between Brain and Visual Signals via Image Reconstruction
- URL: http://arxiv.org/abs/2312.14871v3
- Date: Tue, 3 Sep 2024 23:13:33 GMT
- Title: BrainVis: Exploring the Bridge between Brain and Visual Signals via Image Reconstruction
- Authors: Honghao Fu, Zhiqi Shen, Jing Jih Chin, Hao Wang,
- Abstract summary: Analyzing and reconstructing visual stimuli from brain signals effectively advances the understanding of human visual system.
However, the EEG signals are complex and contain significant noise.
This leads to substantial limitations in existing works of visual stimuli reconstruction from EEG.
We propose a novel approach called BrainVis to address these challenges.
- Score: 7.512223286737468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Analyzing and reconstructing visual stimuli from brain signals effectively advances the understanding of human visual system. However, the EEG signals are complex and contain significant noise. This leads to substantial limitations in existing works of visual stimuli reconstruction from EEG, such as difficulties in aligning EEG embeddings with the fine-grained semantic information and a heavy reliance on additional large self-collected dataset for training. To address these challenges, we propose a novel approach called BrainVis. Firstly, we divide the EEG signals into various units and apply a self-supervised approach on them to obtain EEG time-domain features, in an attempt to ease the training difficulty. Additionally, we also propose to utilize the frequency-domain features to enhance the EEG representations. Then, we simultaneously align EEG time-frequency embeddings with the interpolation of the coarse and fine-grained semantics in the CLIP space, to highlight the primary visual components and reduce the cross-modal alignment difficulty. Finally, we adopt the cascaded diffusion models to reconstruct images. Using only 10\% training data of the previous work, our proposed BrainVis outperforms state of the arts in both semantic fidelity reconstruction and generation quality. The code is available at https://github.com/RomGai/BrainVis.
Related papers
- EAGLE: Enhanced Visual Grounding Minimizes Hallucinations in Instructional Multimodal Models [54.234657224615354]
Large language models and vision transformers have demonstrated impressive zero-shot capabilities, enabling significant transferability in downstream tasks.
Despite incorporating vast image and language pre-training, these multi-modal architectures often generate responses that deviate from the ground truth in the image data.
Current methods for mitigating hallucinations generally focus on regularizing the language component, improving the fusion module, or ensembling multiple visual encoders to improve visual representation.
We show that a straightforward reformulation of the original contrastive pre-training task results in an improved visual encoder that can be incorporated into the instructional multi-modal architecture without additional instructional training.
arXiv Detail & Related papers (2025-01-06T00:39:31Z) - CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information [61.1904164368732]
We propose CognitionCapturer, a unified framework that fully leverages multimodal data to represent EEG signals.
Specifically, CognitionCapturer trains Modality Experts for each modality to extract cross-modal information from the EEG modality.
The framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities.
arXiv Detail & Related papers (2024-12-13T16:27:54Z) - Mind's Eye: Image Recognition by EEG via Multimodal Similarity-Keeping Contrastive Learning [2.087148326341881]
This paper introduces a MUltimodal Similarity-keeping contrastivE learning framework for zero-shot EEG-based image classification.
We develop a series of multivariate time-series encoders tailored for EEG signals and assess the efficacy of regularized contrastive EEG-Image pretraining.
Our method achieves state-of-the-art performance, with a top-1 accuracy of 19.3% and a top-5 accuracy of 48.8% in 200-way zero-shot image classification.
arXiv Detail & Related papers (2024-06-05T16:42:23Z) - Reconstructing Visual Stimulus Images from EEG Signals Based on Deep
Visual Representation Model [5.483279087074447]
We propose a novel image reconstruction method based on EEG signals in this paper.
To satisfy the high recognizability of visual stimulus images in fast switching manner, we build a visual stimuli image dataset.
Deep visual representation model(DVRM) consisting of a primary encoder and a subordinate decoder is proposed to reconstruct visual stimuli.
arXiv Detail & Related papers (2024-03-11T09:19:09Z) - Learning Robust Deep Visual Representations from EEG Brain Recordings [13.768240137063428]
This study proposes a two-stage method where the first step is to obtain EEG-derived features for robust learning of deep representations.
We demonstrate the generalizability of our feature extraction pipeline across three different datasets using deep-learning architectures.
We propose a novel framework to transform unseen images into the EEG space and reconstruct them with approximation.
arXiv Detail & Related papers (2023-10-25T10:26:07Z) - A Knowledge-Driven Cross-view Contrastive Learning for EEG
Representation [48.85731427874065]
This paper proposes a knowledge-driven cross-view contrastive learning framework (KDC2) to extract effective representations from EEG with limited labels.
The KDC2 method creates scalp and neural views of EEG signals, simulating the internal and external representation of brain activity.
By modeling prior neural knowledge based on neural information consistency theory, the proposed method extracts invariant and complementary neural knowledge to generate combined representations.
arXiv Detail & Related papers (2023-09-21T08:53:51Z) - Seeing through the Brain: Image Reconstruction of Visual Perception from
Human Brain Signals [27.92796103924193]
We propose a comprehensive pipeline, named NeuroImagen, for reconstructing visual stimuli images from EEG signals.
We incorporate a novel multi-level perceptual information decoding to draw multi-grained outputs from the given EEG data.
arXiv Detail & Related papers (2023-07-27T12:54:16Z) - Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding.
Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z) - BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP
for Generic Natural Visual Stimulus Decoding [51.911473457195555]
BrainCLIP is a task-agnostic fMRI-based brain decoding model.
It bridges the modality gap between brain activity, image, and text.
BrainCLIP can reconstruct visual stimuli with high semantic fidelity.
arXiv Detail & Related papers (2023-02-25T03:28:54Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.