EgoBrain: Synergizing Minds and Eyes For Human Action Understanding
- URL: http://arxiv.org/abs/2506.01353v1
- Date: Mon, 02 Jun 2025 06:14:02 GMT
- Title: EgoBrain: Synergizing Minds and Eyes For Human Action Understanding
- Authors: Nie Lin, Yansen Wang, Dongqi Han, Weibang Jiang, Jingyuan Li, Ryosuke Furuta, Yoichi Sato, Dongsheng Li,
- Abstract summary: EgoBrain is the world's first large-scale, temporally aligned multimodal dataset that synchronizes egocentric vision and EEG of human brain over extended periods of time.<n>This dataset comprises 61 hours of synchronized 32-channel EEG recordings and first-person video from 40 participants engaged in 29 categories of daily activities.<n>All data, tools, and acquisition protocols are openly shared to foster open science in cognitive computing.
- Score: 31.917322192746678
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The integration of brain-computer interfaces (BCIs), in particular electroencephalography (EEG), with artificial intelligence (AI) has shown tremendous promise in decoding human cognition and behavior from neural signals. In particular, the rise of multimodal AI models have brought new possibilities that have never been imagined before. Here, we present EgoBrain --the world's first large-scale, temporally aligned multimodal dataset that synchronizes egocentric vision and EEG of human brain over extended periods of time, establishing a new paradigm for human-centered behavior analysis. This dataset comprises 61 hours of synchronized 32-channel EEG recordings and first-person video from 40 participants engaged in 29 categories of daily activities. We then developed a muiltimodal learning framework to fuse EEG and vision for action understanding, validated across both cross-subject and cross-environment challenges, achieving an action recognition accuracy of 66.70%. EgoBrain paves the way for a unified framework for brain-computer interface with multiple modalities. All data, tools, and acquisition protocols are openly shared to foster open science in cognitive computing.
Related papers
- BrainFLORA: Uncovering Brain Concept Representation via Multimodal Neural Embeddings [10.966252877363512]
We introduce BrainFLORA, a unified framework for integrating cross-modal neuroimaging data to construct a shared neural representation.<n>Our approach leverages multimodal large language models (MLLMs) augmented with modality-specific adapters and task decoders, achieving state-of-the-art performance in joint-subject visual retrieval.<n>BrainFLORA offers novel implications for cognitive neuroscience and brain-computer interfaces (BCIs)
arXiv Detail & Related papers (2025-07-13T18:56:17Z) - CSBrain: A Cross-scale Spatiotemporal Brain Foundation Model for EEG Decoding [57.90382885533593]
We propose a Cross-scale Spatiotemporal Brain foundation model for generalized decoding EEG signals.<n>We show that CSBrain consistently outperforms task-specific and foundation model baselines.<n>These results establish cross-scale modeling as a key inductive bias and position CSBrain as a robust backbone for future brain-AI research.
arXiv Detail & Related papers (2025-06-29T03:29:34Z) - Voxel-Level Brain States Prediction Using Swin Transformer [65.9194533414066]
We propose a novel architecture which employs a 4D Shifted Window (Swin) Transformer as encoder to efficiently learn-temporal information and a convolutional decoder to enable brain state prediction at the same spatial and temporal resolution as the input fMRI data.<n>Our model has shown high accuracy when predicting 7.2s resting-state brain activities based on the prior 23.04s fMRI time series.<n>This shows promising evidence that thetemporal organization of the human brain can be learned by a Swin Transformer model, at high resolution, which provides a potential for reducing fMRI scan time and the development of brain-computer interfaces
arXiv Detail & Related papers (2025-06-13T04:14:38Z) - Towards Unified Neural Decoding with Brain Functional Network Modeling [34.13766828046489]
We present Multi-individual Brain Region-Aggregated Network (MIBRAIN), a neural decoding framework.<n>MIBRAIN constructs a whole functional brain network model by integrating intracranial neurophysiological recordings across multiple individuals.<n>Our framework paves the way for robust neural decoding across individuals and offers insights for practical clinical applications.
arXiv Detail & Related papers (2025-05-30T12:10:37Z) - Neural Brain: A Neuroscience-inspired Framework for Embodied Agents [58.58177409853298]
Current AI systems, such as large language models, remain disembodied, unable to physically engage with the world.<n>At the core of this challenge lies the concept of Neural Brain, a central intelligence system designed to drive embodied agents with human-like adaptability.<n>This paper introduces a unified framework for the Neural Brain of embodied agents, addressing two fundamental challenges.
arXiv Detail & Related papers (2025-05-12T15:05:34Z) - Shifting Attention to You: Personalized Brain-Inspired AI Models [3.0128071072792366]
We show that integrating human behavioral insights and millisecond scale neural data within a fine tuned CLIP based model over doubles behavioral performance compared to the unmodified CLIP baseline.<n>Our work establishes a novel, interpretable framework for designing adaptive AI systems, with broad implications for neuroscience, personalized medicine, and human-computer interaction.
arXiv Detail & Related papers (2025-02-07T04:55:31Z) - Knowledge-Guided Prompt Learning for Lifespan Brain MR Image Segmentation [53.70131202548981]
We present a two-step segmentation framework employing Knowledge-Guided Prompt Learning (KGPL) for brain MRI.
Specifically, we first pre-train segmentation models on large-scale datasets with sub-optimal labels.
The introduction of knowledge-wise prompts captures semantic relationships between anatomical variability and biological processes.
arXiv Detail & Related papers (2024-07-31T04:32:43Z) - Orangutan: A Multiscale Brain Emulation-Based Artificial Intelligence Framework for Dynamic Environments [2.8137865669570297]
This paper introduces a novel brain-inspired AI framework, Orangutan.
It simulates the structure and computational mechanisms of biological brains on multiple scales.
I have developed a sensorimotor model that simulates human saccadic eye movements during object observation.
arXiv Detail & Related papers (2024-06-18T01:41:57Z) - Achieving More Human Brain-Like Vision via Human EEG Representational Alignment [1.811217832697894]
We present 'Re(presentational)Al(ignment)net', a vision model aligned with human brain activity based on non-invasive EEG.
Our innovative image-to-brain multi-layer encoding framework advances human neural alignment by optimizing multiple model layers.
Our findings suggest that ReAlnet represents a breakthrough in bridging the gap between artificial and human vision, and paving the way for more brain-like artificial intelligence systems.
arXiv Detail & Related papers (2024-01-30T18:18:41Z) - Enhancing HOI Detection with Contextual Cues from Large Vision-Language Models [56.257840490146]
ConCue is a novel approach for improving visual feature extraction in HOI detection.
We develop a transformer-based feature extraction module with a multi-tower architecture that integrates contextual cues into both instance and interaction detectors.
arXiv Detail & Related papers (2023-11-26T09:11:32Z) - UniBrain: Universal Brain MRI Diagnosis with Hierarchical
Knowledge-enhanced Pre-training [66.16134293168535]
We propose a hierarchical knowledge-enhanced pre-training framework for the universal brain MRI diagnosis, termed as UniBrain.
Specifically, UniBrain leverages a large-scale dataset of 24,770 imaging-report pairs from routine diagnostics.
arXiv Detail & Related papers (2023-09-13T09:22:49Z) - In the realm of hybrid Brain: Human Brain and AI [0.0]
Current brain-computer interface (BCI) technology is mainly on therapeutic outcomes.
Recently, artificial intelligence (AI) and machine learning (ML) technologies have been used to decode brain signals.
We envision the development of closed loop, intelligent, low-power, and miniaturized neural interfaces.
arXiv Detail & Related papers (2022-10-04T08:35:34Z) - Multimodal foundation models are better simulators of the human brain [65.10501322822881]
We present a newly-designed multimodal foundation model pre-trained on 15 million image-text pairs.
We find that both visual and lingual encoders trained multimodally are more brain-like compared with unimodal ones.
arXiv Detail & Related papers (2022-08-17T12:36:26Z) - An Investigation on Non-Invasive Brain-Computer Interfaces: Emotiv Epoc+
Neuroheadset and Its Effectiveness [0.7734726150561089]
We explore a decoding natural speech approach that is designed to decode human speech directly from the human brain onto a digital screen introduced by Facebook Reality Lab and University of California San Francisco.
Then, we study a recently presented visionary project to control the human brain using Brain-Machine Interfaces (BMI) approach.
We envision that non-invasive, insertable, and low-cost BCI approaches shall be the focal point for not only an alternative for patients with physical paralysis but also understanding the brain.
arXiv Detail & Related papers (2022-06-24T05:45:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.