Related papers: Mind-to-Face: Neural-Driven Photorealistic Avatar Synthesis via EEG Decoding

Mind-to-Face: Neural-Driven Photorealistic Avatar Synthesis via EEG Decoding

URL: http://arxiv.org/abs/2512.04313v1
Date: Wed, 03 Dec 2025 23:02:27 GMT
Title: Mind-to-Face: Neural-Driven Photorealistic Avatar Synthesis via EEG Decoding
Authors: Haolin Xiong, Tianwen Fu, Pratusha Bhuvana Prasad, Yunxuan Cai, Haiwei Chen, Wenbin Teng, Hanyuan Xiao, Yajie Zhao,
Abstract summary: We present Mind-to-Face, the first framework that decodes non-invasive electroencephalogram (EEG) signals directly into high-fidelity facial expressions.<n>We show that EEG alone can reliably predict dynamic, subject-specific facial expressions, including subtle emotional responses.<n>Mind-to-Face establishes a new paradigm for neural-driven avatars, enabling personalized, emotion-aware telepresence and cognitive interaction in immersive environments.
Score: 11.030344145348097
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Current expressive avatar systems rely heavily on visual cues, failing when faces are occluded or when emotions remain internal. We present Mind-to-Face, the first framework that decodes non-invasive electroencephalogram (EEG) signals directly into high-fidelity facial expressions. We build a dual-modality recording setup to obtain synchronized EEG and multi-view facial video during emotion-eliciting stimuli, enabling precise supervision for neural-to-visual learning. Our model uses a CNN-Transformer encoder to map EEG signals into dense 3D position maps, capable of sampling over 65k vertices, capturing fine-scale geometry and subtle emotional dynamics, and renders them through a modified 3D Gaussian Splatting pipeline for photorealistic, view-consistent results. Through extensive evaluation, we show that EEG alone can reliably predict dynamic, subject-specific facial expressions, including subtle emotional responses, demonstrating that neural signals contain far richer affective and geometric information than previously assumed. Mind-to-Face establishes a new paradigm for neural-driven avatars, enabling personalized, emotion-aware telepresence and cognitive interaction in immersive environments.

Related papers

A Brain Wave Encodes a Thousand Tokens: Modeling Inter-Cortical Neural Interactions for Effective EEG-based Emotion Recognition [0.41998444721319217]
We propose RBTransformer, a Transformer-based neural network architecture that models inter-cortical neural dynamics of the brain in latent space.<n>We conducted extensive experiments, specifically under subject-dependent settings, on the SEED, DEAP, and DREAMER datasets.<n>The results demonstrate that the proposed RBTransformer outperforms all previous state-of-the-art methods across all three datasets.
arXiv Detail & Related papers (2025-11-17T22:27:12Z)
Think-Before-Draw: Decomposing Emotion Semantics & Fine-Grained Controllable Expressive Talking Head Generation [7.362433184546492]
Emotional talking-head generation has emerged as a pivotal research area at the intersection of computer vision and multimodal artificial intelligence.<n>This study proposes the Think-Before-Draw framework to address two key challenges.
arXiv Detail & Related papers (2025-07-17T03:33:46Z)
Interpretable EEG-to-Image Generation with Semantic Prompts [6.712646807032639]
Our model bypasses direct EEG-to-image generation by aligning EEG signals with semantic captions.<n>A transformer-based EEG encoder maps brain activity to these captions through contrastive learning.<n>This text-mediated framework yields state-of-the-art visual decoding on the EEGCVPR dataset.
arXiv Detail & Related papers (2025-07-09T17:18:06Z)
VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis [70.76837748695841]
We propose VisualSpeaker, a novel method that bridges the gap using photorealistic differentiable rendering, supervised by visual speech recognition, for improved 3D facial animation.<n>Our contribution is a perceptual lip-reading loss, derived by passing 3D Gaussian Splatting avatar renders through a pre-trained Visual Automatic Speech Recognition model during training.<n> Evaluation on the MEAD dataset demonstrates that VisualSpeaker improves both the standard Lip Vertex Error metric by 56.1% and the perceptual quality of the generated animations, while retaining the controllability of mesh-driven animation.
arXiv Detail & Related papers (2025-07-08T15:04:17Z)
Neuro-3D: Towards 3D Visual Decoding from EEG Signals [49.502364730056044]
We introduce a new neuroscience task: decoding 3D visual perception from EEG signals.<n>We first present EEG-3D, a dataset featuring multimodal analysis data and EEG recordings from 12 subjects viewing 72 categories of 3D objects rendered in both videos and images.<n>We propose Neuro-3D, a 3D visual decoding framework based on EEG signals.
arXiv Detail & Related papers (2024-11-19T05:52:17Z)
EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head [30.138347111341748]
We present a novel approach for synthesizing 3D talking heads with controllable emotion. Our model enables controllable emotion in the generated talking heads and can be rendered in wide-range views. Experiments demonstrate the effectiveness of our approach in generating high-fidelity and emotion-controllable 3D talking heads.
arXiv Detail & Related papers (2024-08-01T05:46:57Z)
Controllable Radiance Fields for Dynamic Face Synthesis [125.48602100893845]
We study how to explicitly control generative model synthesis of face dynamics exhibiting non-rigid motion. Controllable Radiance Field (CoRF) On head image/video data we show that CoRFs are 3D-aware while enabling editing of identity, viewing directions, and motion.
arXiv Detail & Related papers (2022-10-11T23:17:31Z)
Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control [54.079327030892244]
Free-HeadGAN is a person-generic neural talking head synthesis system. We show that modeling faces with sparse 3D facial landmarks are sufficient for achieving state-of-the-art generative performance.
arXiv Detail & Related papers (2022-08-03T16:46:08Z)
EMOCA: Emotion Driven Monocular Face Capture and Animation [59.15004328155593]
We introduce a novel deep perceptual emotion consistency loss during training, which helps ensure that the reconstructed 3D expression matches the expression depicted in the input image. On the task of in-the-wild emotion recognition, our purely geometric approach is on par with the best image-based methods, highlighting the value of 3D geometry in analyzing human behavior.
arXiv Detail & Related papers (2022-04-24T15:58:35Z)
Continuous Emotion Recognition with Spatiotemporal Convolutional Neural Networks [82.54695985117783]
We investigate the suitability of state-of-the-art deep learning architectures for continuous emotion recognition using long video sequences captured in-the-wild. We have developed and evaluated convolutional recurrent neural networks combining 2D-CNNs and long short term-memory units, and inflated 3D-CNN models, which are built by inflating the weights of a pre-trained 2D-CNN model during fine-tuning.
arXiv Detail & Related papers (2020-11-18T13:42:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.