Related papers: Spontaneous Spatial Cognition Emerges during Egocentric Video Viewing through Non-invasive BCI

Spontaneous Spatial Cognition Emerges during Egocentric Video Viewing through Non-invasive BCI

URL: http://arxiv.org/abs/2507.12417v1
Date: Wed, 16 Jul 2025 17:07:57 GMT
Title: Spontaneous Spatial Cognition Emerges during Egocentric Video Viewing through Non-invasive BCI
Authors: Weichen Dai, Yuxuan Huang, Li Zhu, Dongjun Liu, Yu Zhang, Qibin Zhao, Andrzej Cichocki, Fabio Babiloni, Ke Li, Jianyu Qiu, Gangyong Jia, Wanzeng Kong, Qing Wu,
Abstract summary: We show for the first time that non-invasive brain-computer interfaces can decode spontaneous, fine-grained egocentric 6D pose.<n>Despite EEG's limited spatial resolution and high signal noise, we find that spatially coherent visual input reliably evokes decodable spatial representations.
Score: 42.53877172400408
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Humans possess a remarkable capacity for spatial cognition, allowing for self-localization even in novel or unfamiliar environments. While hippocampal neurons encoding position and orientation are well documented, the large-scale neural dynamics supporting spatial representation, particularly during naturalistic, passive experience, remain poorly understood. Here, we demonstrate for the first time that non-invasive brain-computer interfaces (BCIs) based on electroencephalography (EEG) can decode spontaneous, fine-grained egocentric 6D pose, comprising three-dimensional position and orientation, during passive viewing of egocentric video. Despite EEG's limited spatial resolution and high signal noise, we find that spatially coherent visual input (i.e., continuous and structured motion) reliably evokes decodable spatial representations, aligning with participants' subjective sense of spatial engagement. Decoding performance further improves when visual input is presented at a frame rate of 100 ms per image, suggesting alignment with intrinsic neural temporal dynamics. Using gradient-based backpropagation through a neural decoding model, we identify distinct EEG channels contributing to position -- and orientation specific -- components, revealing a distributed yet complementary neural encoding scheme. These findings indicate that the brain's spatial systems operate spontaneously and continuously, even under passive conditions, challenging traditional distinctions between active and passive spatial cognition. Our results offer a non-invasive window into the automatic construction of egocentric spatial maps and advance our understanding of how the human mind transforms everyday sensory experience into structured internal representations.

Related papers

Deep Neural Encoder-Decoder Model to Relate fMRI Brain Activity with Naturalistic Stimuli [2.7149743794003913]
We propose an end-to-end deep neural encoder-decoder model to encode and decode brain activity in response to naturalistic stimuli.<n>We employ temporal convolutional layers in our architecture, which effectively allows to bridge the temporal resolution gap between natural movie stimuli and fMRI.
arXiv Detail & Related papers (2025-07-16T08:08:48Z)
Embodied World Models Emerge from Navigational Task in Open-Ended Environments [5.785697934050656]
We ask whether a recurrent agent, trained solely by sparse rewards to solve procedurally generated planar mazes, can autonomously internalize metric concepts such as direction, distance and obstacle layout.<n>After training, the agent consistently produces near-optimal paths in unseen mazes, behavior that hints at an underlying spatial model.
arXiv Detail & Related papers (2025-04-15T17:35:13Z)
From Eye to Mind: brain2text Decoding Reveals the Neural Mechanisms of Visual Semantic Processing [0.3069335774032178]
We introduce a paradigm shift by directly decoding fMRI signals into textual descriptions of viewed natural images.<n>Our novel deep learning model, trained without visual input, achieves state-of-the-art semantic decoding performance.<n>Neuroanatomical analysis reveals the critical role of higher-level visual regions, including MT+, ventral stream visual cortex, and inferior parietal cortex.
arXiv Detail & Related papers (2025-03-15T07:28:02Z)
Spherical World-Locking for Audio-Visual Localization in Egocentric Videos [53.658928180166534]
We propose Spherical World-Locking as a general framework for egocentric scene representation. Compared to conventional head-locked egocentric representations with a 2D planar field-of-view, SWL effectively offsets challenges posed by self-motion. We design a unified encoder-decoder transformer architecture that preserves the spherical structure of the scene representation.
arXiv Detail & Related papers (2024-08-09T22:29:04Z)
Discretization of continuous input spaces in the hippocampal autoencoder [0.0]
We show that forming discrete memories of visual events in sparse autoencoder neurons can produce spatial tuning similar to hippocampal place cells. We extend our results to the auditory domain, showing that neurons similarly tile the frequency space in an experience-dependent manner. Lastly, we show that reinforcement learning agents can effectively perform various visuo-spatial cognitive tasks using these sparse, very high-dimensional representations.
arXiv Detail & Related papers (2024-05-23T14:16:44Z)
Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks [59.38765771221084]
We present a physiologically inspired speech recognition architecture compatible and scalable with deep learning frameworks. We show end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network. Our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance.
arXiv Detail & Related papers (2024-04-22T09:40:07Z)
Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding. Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z)
Egocentric Audio-Visual Object Localization [51.434212424829525]
We propose a geometry-aware temporal aggregation module to handle the egomotion explicitly. The effect of egomotion is mitigated by estimating the temporal geometry transformation and exploiting it to update visual representations. It improves cross-modal localization robustness by disentangling visually-indicated audio representation.
arXiv Detail & Related papers (2023-03-23T17:43:11Z)
Learning What and Where -- Unsupervised Disentangling Location and Identity Tracking [0.44040106718326594]
We introduce an unsupervisedd LOCation and Identity tracking system (Loci) Inspired by the dorsal-ventral pathways in the brain, Loci tackles the what-and-where binding problem by means of a self-supervised segregation mechanism. Loci may set the stage for deeper, explanation-oriented video processing.
arXiv Detail & Related papers (2022-05-26T13:30:14Z)
Deep Representations for Time-varying Brain Datasets [4.129225533930966]
This paper builds an efficient graph neural network model that incorporates both region-mapped fMRI sequences and structural connectivities as inputs. We find good representations of the latent brain dynamics through learning sample-level adaptive adjacency matrices. These modules can be easily adapted to and are potentially useful for other applications outside the neuroscience domain.
arXiv Detail & Related papers (2022-05-23T21:57:31Z)
Continuous Emotion Recognition with Spatiotemporal Convolutional Neural Networks [82.54695985117783]
We investigate the suitability of state-of-the-art deep learning architectures for continuous emotion recognition using long video sequences captured in-the-wild. We have developed and evaluated convolutional recurrent neural networks combining 2D-CNNs and long short term-memory units, and inflated 3D-CNN models, which are built by inflating the weights of a pre-trained 2D-CNN model during fine-tuning.
arXiv Detail & Related papers (2020-11-18T13:42:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.