Mind-to-Face: Neural-Driven Photorealistic Avatar Synthesis via EEG Decoding
- URL: http://arxiv.org/abs/2512.04313v1
- Date: Wed, 03 Dec 2025 23:02:27 GMT
- Title: Mind-to-Face: Neural-Driven Photorealistic Avatar Synthesis via EEG Decoding
- Authors: Haolin Xiong, Tianwen Fu, Pratusha Bhuvana Prasad, Yunxuan Cai, Haiwei Chen, Wenbin Teng, Hanyuan Xiao, Yajie Zhao,
- Abstract summary: We present Mind-to-Face, the first framework that decodes non-invasive electroencephalogram (EEG) signals directly into high-fidelity facial expressions.<n>We show that EEG alone can reliably predict dynamic, subject-specific facial expressions, including subtle emotional responses.<n>Mind-to-Face establishes a new paradigm for neural-driven avatars, enabling personalized, emotion-aware telepresence and cognitive interaction in immersive environments.
- Score: 11.030344145348097
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current expressive avatar systems rely heavily on visual cues, failing when faces are occluded or when emotions remain internal. We present Mind-to-Face, the first framework that decodes non-invasive electroencephalogram (EEG) signals directly into high-fidelity facial expressions. We build a dual-modality recording setup to obtain synchronized EEG and multi-view facial video during emotion-eliciting stimuli, enabling precise supervision for neural-to-visual learning. Our model uses a CNN-Transformer encoder to map EEG signals into dense 3D position maps, capable of sampling over 65k vertices, capturing fine-scale geometry and subtle emotional dynamics, and renders them through a modified 3D Gaussian Splatting pipeline for photorealistic, view-consistent results. Through extensive evaluation, we show that EEG alone can reliably predict dynamic, subject-specific facial expressions, including subtle emotional responses, demonstrating that neural signals contain far richer affective and geometric information than previously assumed. Mind-to-Face establishes a new paradigm for neural-driven avatars, enabling personalized, emotion-aware telepresence and cognitive interaction in immersive environments.
Related papers
- A Brain Wave Encodes a Thousand Tokens: Modeling Inter-Cortical Neural Interactions for Effective EEG-based Emotion Recognition [0.41998444721319217]
We propose RBTransformer, a Transformer-based neural network architecture that models inter-cortical neural dynamics of the brain in latent space.<n>We conducted extensive experiments, specifically under subject-dependent settings, on the SEED, DEAP, and DREAMER datasets.<n>The results demonstrate that the proposed RBTransformer outperforms all previous state-of-the-art methods across all three datasets.
arXiv Detail & Related papers (2025-11-17T22:27:12Z) - Think-Before-Draw: Decomposing Emotion Semantics & Fine-Grained Controllable Expressive Talking Head Generation [7.362433184546492]
Emotional talking-head generation has emerged as a pivotal research area at the intersection of computer vision and multimodal artificial intelligence.<n>This study proposes the Think-Before-Draw framework to address two key challenges.
arXiv Detail & Related papers (2025-07-17T03:33:46Z) - Interpretable EEG-to-Image Generation with Semantic Prompts [6.712646807032639]
Our model bypasses direct EEG-to-image generation by aligning EEG signals with semantic captions.<n>A transformer-based EEG encoder maps brain activity to these captions through contrastive learning.<n>This text-mediated framework yields state-of-the-art visual decoding on the EEGCVPR dataset.
arXiv Detail & Related papers (2025-07-09T17:18:06Z) - VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis [70.76837748695841]
We propose VisualSpeaker, a novel method that bridges the gap using photorealistic differentiable rendering, supervised by visual speech recognition, for improved 3D facial animation.<n>Our contribution is a perceptual lip-reading loss, derived by passing 3D Gaussian Splatting avatar renders through a pre-trained Visual Automatic Speech Recognition model during training.<n> Evaluation on the MEAD dataset demonstrates that VisualSpeaker improves both the standard Lip Vertex Error metric by 56.1% and the perceptual quality of the generated animations, while retaining the controllability of mesh-driven animation.
arXiv Detail & Related papers (2025-07-08T15:04:17Z) - Neuro-3D: Towards 3D Visual Decoding from EEG Signals [49.502364730056044]
We introduce a new neuroscience task: decoding 3D visual perception from EEG signals.<n>We first present EEG-3D, a dataset featuring multimodal analysis data and EEG recordings from 12 subjects viewing 72 categories of 3D objects rendered in both videos and images.<n>We propose Neuro-3D, a 3D visual decoding framework based on EEG signals.
arXiv Detail & Related papers (2024-11-19T05:52:17Z) - EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head [30.138347111341748]
We present a novel approach for synthesizing 3D talking heads with controllable emotion.
Our model enables controllable emotion in the generated talking heads and can be rendered in wide-range views.
Experiments demonstrate the effectiveness of our approach in generating high-fidelity and emotion-controllable 3D talking heads.
arXiv Detail & Related papers (2024-08-01T05:46:57Z) - Controllable Radiance Fields for Dynamic Face Synthesis [125.48602100893845]
We study how to explicitly control generative model synthesis of face dynamics exhibiting non-rigid motion.
Controllable Radiance Field (CoRF)
On head image/video data we show that CoRFs are 3D-aware while enabling editing of identity, viewing directions, and motion.
arXiv Detail & Related papers (2022-10-11T23:17:31Z) - Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control [54.079327030892244]
Free-HeadGAN is a person-generic neural talking head synthesis system.
We show that modeling faces with sparse 3D facial landmarks are sufficient for achieving state-of-the-art generative performance.
arXiv Detail & Related papers (2022-08-03T16:46:08Z) - EMOCA: Emotion Driven Monocular Face Capture and Animation [59.15004328155593]
We introduce a novel deep perceptual emotion consistency loss during training, which helps ensure that the reconstructed 3D expression matches the expression depicted in the input image.
On the task of in-the-wild emotion recognition, our purely geometric approach is on par with the best image-based methods, highlighting the value of 3D geometry in analyzing human behavior.
arXiv Detail & Related papers (2022-04-24T15:58:35Z) - Continuous Emotion Recognition with Spatiotemporal Convolutional Neural
Networks [82.54695985117783]
We investigate the suitability of state-of-the-art deep learning architectures for continuous emotion recognition using long video sequences captured in-the-wild.
We have developed and evaluated convolutional recurrent neural networks combining 2D-CNNs and long short term-memory units, and inflated 3D-CNN models, which are built by inflating the weights of a pre-trained 2D-CNN model during fine-tuning.
arXiv Detail & Related papers (2020-11-18T13:42:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.