Related papers: REFA: Real-time Egocentric Facial Animations for Virtual Reality

REFA: Real-time Egocentric Facial Animations for Virtual Reality

URL: http://arxiv.org/abs/2601.03507v1
Date: Wed, 07 Jan 2026 01:41:46 GMT
Title: REFA: Real-time Egocentric Facial Animations for Virtual Reality
Authors: Qiang Zhang, Tong Xiao, Haroun Habeeb, Larissa Laich, Sofien Bouaziz, Patrick Snape, Wenjing Zhang, Matthew Cioffi, Peizhao Zhang, Pavel Pidlypenskyi, Winnie Lin, Luming Ma, Mengjiao Wang, Kunpeng Li, Chengjiang Long, Steven Song, Martin Prazak, Alexander Sjoholm, Ajinkya Deogade, Jaebong Lee, Julio Delgado Mangas, Amaury Aubel,
Abstract summary: We present a novel system for real-time tracking of facial expressions using egocentric views captured from a set of infrared cameras embedded in a virtual reality (VR) headset.<n>Our technology facilitates any user to accurately drive the facial expressions of virtual characters in a non-intrusive manner.
Score: 56.82169742343143
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a novel system for real-time tracking of facial expressions using egocentric views captured from a set of infrared cameras embedded in a virtual reality (VR) headset. Our technology facilitates any user to accurately drive the facial expressions of virtual characters in a non-intrusive manner and without the need of a lengthy calibration step. At the core of our system is a distillation based approach to train a machine learning model on heterogeneous data and labels coming form multiple sources, \eg synthetic and real images. As part of our dataset, we collected 18k diverse subjects using a lightweight capture setup consisting of a mobile phone and a custom VR headset with extra cameras. To process this data, we developed a robust differentiable rendering pipeline enabling us to automatically extract facial expression labels. Our system opens up new avenues for communication and expression in virtual environments, with applications in video conferencing, gaming, entertainment, and remote collaboration.

Related papers

Audio Driven Real-Time Facial Animation for Social Telepresence [65.66220599734338]
We present an audio-driven real-time system for animating photorealistic 3D facial avatars with minimal latency.<n>Central to our approach is an encoder model that transforms audio signals into latent facial expression sequences in real time.<n>We capture the rich spectrum of facial expressions necessary for natural communication while achieving real-time performance.
arXiv Detail & Related papers (2025-10-01T17:57:05Z)
Eye-See-You: Reverse Pass-Through VR and Head Avatars [6.514203459446675]
RevAvatar is an innovative framework that leverages AI methodologies to enable reverse pass-through technology.<n>RevAvatar integrates state-of-the-art generative models and multimodal AI techniques to reconstruct high-fidelity 2D facial images.<n>We present VR-Face, a novel dataset comprising 200,000 samples designed to emulate diverse VR-specific conditions.
arXiv Detail & Related papers (2025-05-24T21:08:19Z)
FRAME: Floor-aligned Representation for Avatar Motion from Egocentric Video [52.33896173943054]
Egocentric motion capture with a head-mounted body-facing stereo camera is crucial for VR and AR applications.<n>Existing methods rely on synthetic pretraining and struggle to generate smooth and accurate predictions in real-world settings.<n>We propose FRAME, a simple yet effective architecture that combines device pose and camera feeds for state-of-the-art body pose prediction.
arXiv Detail & Related papers (2025-03-29T14:26:06Z)
Universal Facial Encoding of Codec Avatars from VR Headsets [32.60236093340087]
We present a method that can animate a photorealistic avatar in realtime from head-mounted cameras (HMCs) on a consumer VR headset. We present a lightweight expression calibration mechanism that increases accuracy with minimal additional cost to run-time efficiency.
arXiv Detail & Related papers (2024-07-17T22:08:15Z)
VR-GS: A Physical Dynamics-Aware Interactive Gaussian Splatting System in Virtual Reality [39.53150683721031]
Our proposed VR-GS system represents a leap forward in human-centered 3D content interaction. The components of our Virtual Reality system are designed for high efficiency and effectiveness.
arXiv Detail & Related papers (2024-01-30T01:28:36Z)
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations [107.88375243135579]
Given speech audio, we output multiple possibilities of gestural motion for an individual, including face, body, and hands. We visualize the generated motion using highly photorealistic avatars that can express crucial nuances in gestures. Experiments show our model generates appropriate and diverse gestures, outperforming both diffusion- and VQ-only methods.
arXiv Detail & Related papers (2024-01-03T18:55:16Z)
Robust Egocentric Photo-realistic Facial Expression Transfer for Virtual Reality [68.18446501943585]
Social presence will fuel the next generation of communication systems driven by digital humans in virtual reality (VR) The best 3D video-realistic VR avatars that minimize the uncanny effect rely on person-specific (PS) models. This paper makes progress in overcoming these limitations by proposing an end-to-end multi-identity architecture.
arXiv Detail & Related papers (2021-04-10T15:48:53Z)
Unmasking Communication Partners: A Low-Cost AI Solution for Digitally Removing Head-Mounted Displays in VR-Based Telepresence [62.997667081978825]
Face-to-face conversation in Virtual Reality (VR) is a challenge when participants wear head-mounted displays (HMD) Past research has shown that high-fidelity face reconstruction with personal avatars in VR is possible under laboratory conditions with high-cost hardware. We propose one of the first low-cost systems for this task which uses only open source, free software and affordable hardware.
arXiv Detail & Related papers (2020-11-06T23:17:12Z)
Facial Expression Recognition Under Partial Occlusion from Virtual Reality Headsets based on Transfer Learning [0.0]
convolutional neural network based approaches has become widely adopted due to their proven applicability to Facial Expression Recognition task. However, recognizing facial expression while wearing a head-mounted VR headset is a challenging task due to the upper half of the face being completely occluded. We propose a geometric model to simulate occlusion resulting from a Samsung Gear VR headset that can be applied to existing FER datasets.
arXiv Detail & Related papers (2020-08-12T20:25:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.