Human-Aligned Image Models Improve Visual Decoding from the Brain
- URL: http://arxiv.org/abs/2502.03081v1
- Date: Wed, 05 Feb 2025 11:14:51 GMT
- Title: Human-Aligned Image Models Improve Visual Decoding from the Brain
- Authors: Nona Rajabi, Antônio H. Ribeiro, Miguel Vasco, Farzaneh Taleb, Mårten Björkman, Danica Kragic,
- Abstract summary: We introduce the use of human-aligned image encoders to map brain signals to images.
Our empirical results support this hypothesis, demonstrating that this simple modification improves image retrieval accuracy by up to 21%.
- Score: 16.184884942703466
- License:
- Abstract: Decoding visual images from brain activity has significant potential for advancing brain-computer interaction and enhancing the understanding of human perception. Recent approaches align the representation spaces of images and brain activity to enable visual decoding. In this paper, we introduce the use of human-aligned image encoders to map brain signals to images. We hypothesize that these models more effectively capture perceptual attributes associated with the rapid visual stimuli presentations commonly used in visual brain data recording experiments. Our empirical results support this hypothesis, demonstrating that this simple modification improves image retrieval accuracy by up to 21% compared to state-of-the-art methods. Comprehensive experiments confirm consistent performance improvements across diverse EEG architectures, image encoders, alignment methods, participants, and brain imaging modalities.
Related papers
- MindAligner: Explicit Brain Functional Alignment for Cross-Subject Visual Decoding from Limited fMRI Data [64.92867794764247]
MindAligner is a framework for cross-subject brain decoding from limited fMRI data.
Brain Transfer Matrix (BTM) projects the brain signals of an arbitrary new subject to one of the known subjects.
Brain Functional Alignment module is proposed to perform soft cross-subject brain alignment under different visual stimuli.
arXiv Detail & Related papers (2025-02-07T16:01:59Z) - Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models [10.615012396285337]
We develop algorithms to enhance our understanding of visual processes by incorporating whole-brain activation maps.
We first compare our method with state-of-the-art approaches to decoding visual processing and show improved predictive semantic accuracy by 43%.
arXiv Detail & Related papers (2024-11-11T16:51:17Z) - Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity [60.983327742457995]
Reconstructing the viewed images from human brain activity bridges human and computer vision through the Brain-Computer Interface.
We devise Psychometry, an omnifit model for reconstructing images from functional Magnetic Resonance Imaging (fMRI) obtained from different subjects.
arXiv Detail & Related papers (2024-03-29T07:16:34Z) - Seeing through the Brain: Image Reconstruction of Visual Perception from
Human Brain Signals [27.92796103924193]
We propose a comprehensive pipeline, named NeuroImagen, for reconstructing visual stimuli images from EEG signals.
We incorporate a novel multi-level perceptual information decoding to draw multi-grained outputs from the given EEG data.
arXiv Detail & Related papers (2023-07-27T12:54:16Z) - Improving visual image reconstruction from human brain activity using
latent diffusion models via multiple decoded inputs [2.4366811507669124]
Integration of deep learning and neuroscience has led to improvements in the analysis of brain activity.
The reconstruction of visual experience from human brain activity is an area that has particularly benefited.
We examine the extent to which various additional decoding techniques affect the performance of visual experience reconstruction.
arXiv Detail & Related papers (2023-06-20T13:48:02Z) - Brain Captioning: Decoding human brain activity into images and text [1.5486926490986461]
We present an innovative method for decoding brain activity into meaningful images and captions.
Our approach takes advantage of cutting-edge image captioning models and incorporates a unique image reconstruction pipeline.
We evaluate our methods using quantitative metrics for both generated captions and images.
arXiv Detail & Related papers (2023-05-19T09:57:19Z) - Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding.
Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z) - BI AVAN: Brain inspired Adversarial Visual Attention Network [67.05560966998559]
We propose a brain-inspired adversarial visual attention network (BI-AVAN) to characterize human visual attention directly from functional brain activity.
Our model imitates the biased competition process between attention-related/neglected objects to identify and locate the visual objects in a movie frame the human brain focuses on in an unsupervised manner.
arXiv Detail & Related papers (2022-10-27T22:20:36Z) - Computational imaging with the human brain [1.614301262383079]
Brain-computer interfaces (BCIs) are enabling a range of new possibilities and routes for augmenting human capability.
We demonstrate ghost imaging of a hidden scene using the human visual system that is combined with an adaptive computational imaging scheme.
This brain-computer connectivity demonstrates a form of augmented human computation that could in the future extend the sensing range of human vision.
arXiv Detail & Related papers (2022-10-07T08:40:18Z) - Adapting Brain-Like Neural Networks for Modeling Cortical Visual
Prostheses [68.96380145211093]
Cortical prostheses are devices implanted in the visual cortex that attempt to restore lost vision by electrically stimulating neurons.
Currently, the vision provided by these devices is limited, and accurately predicting the visual percepts resulting from stimulation is an open challenge.
We propose to address this challenge by utilizing 'brain-like' convolutional neural networks (CNNs), which have emerged as promising models of the visual system.
arXiv Detail & Related papers (2022-09-27T17:33:19Z) - Multi-Modal Masked Autoencoders for Medical Vision-and-Language
Pre-Training [62.215025958347105]
We propose a self-supervised learning paradigm with multi-modal masked autoencoders.
We learn cross-modal domain knowledge by reconstructing missing pixels and tokens from randomly masked images and texts.
arXiv Detail & Related papers (2022-09-15T07:26:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.