Expression-aware video inpainting for HMD removal in XR applications
- URL: http://arxiv.org/abs/2401.14136v1
- Date: Thu, 25 Jan 2024 12:32:21 GMT
- Title: Expression-aware video inpainting for HMD removal in XR applications
- Authors: Fatemeh Ghorbani Lohesara, Karen Egiazarian, Sebastian Knorr
- Abstract summary: Head-mounted displays (HMDs) serve as indispensable devices for observing extended reality (XR) environments and virtual content.
HMDs present an obstacle to external recording techniques as they block the upper face of the user.
We propose a new network for expression-aware video inpainting for HMD removal based on generative adversarial networks (GANs)
- Score: 0.27624021966289597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Head-mounted displays (HMDs) serve as indispensable devices for observing
extended reality (XR) environments and virtual content. However, HMDs present
an obstacle to external recording techniques as they block the upper face of
the user. This limitation significantly affects social XR applications,
specifically teleconferencing, where facial features and eye gaze information
play a vital role in creating an immersive user experience. In this study, we
propose a new network for expression-aware video inpainting for HMD removal
(EVI-HRnet) based on generative adversarial networks (GANs). Our model
effectively fills in missing information with regard to facial landmarks and a
single occlusion-free reference image of the user. The framework and its
components ensure the preservation of the user's identity across frames using
the reference frame. To further improve the level of realism of the inpainted
output, we introduce a novel facial expression recognition (FER) loss function
for emotion preservation. Our results demonstrate the remarkable capability of
the proposed framework to remove HMDs from facial videos while maintaining the
subject's facial expression and identity. Moreover, the outputs exhibit
temporal consistency along the inpainted frames. This lightweight framework
presents a practical approach for HMD occlusion removal, with the potential to
enhance various collaborative XR applications without the need for additional
hardware.
Related papers
- AffectSRNet : Facial Emotion-Aware Super-Resolution Network [5.295131292624206]
We propose AffectSRNet, a novel emotion-aware super-resolution framework for facial expression recognition.
Our method bridges the gap between image resolution and expression accuracy by employing an expression-preserving loss function.
We show that AffectSRNet outperforms existing FSR approaches in both visual quality and emotion fidelity.
arXiv Detail & Related papers (2025-02-14T06:02:59Z) - OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration.
We propose OSDFace, a novel one-step diffusion model for face restoration.
Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z) - Reconstructive Visual Instruction Tuning [64.91373889600136]
reconstructive visual instruction tuning (ROSS) is a family of Large Multimodal Models (LMMs) that exploit vision-centric supervision signals.
It reconstructs latent representations of input images, avoiding directly regressing exact raw RGB values.
Empirically, ROSS consistently brings significant improvements across different visual encoders and language models.
arXiv Detail & Related papers (2024-10-12T15:54:29Z) - Towards Realistic Landmark-Guided Facial Video Inpainting Based on GANs [0.27624021966289597]
This study introduces a network designed for expression-based video inpainting.
It employs generative adversarial networks (GANs) to handle static and moving occlusions across all frames.
We further enhance emotional preservation through a customized facial expression recognition (FER) loss function, ensuring detailed inpainted outputs.
arXiv Detail & Related papers (2024-02-14T11:20:47Z) - From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expression Recognition in Videos [88.08209394979178]
Dynamic facial expression recognition (DFER) in the wild is still hindered by data limitations.
We introduce a novel Static-to-Dynamic model (S2D) that leverages existing SFER knowledge and dynamic information implicitly encoded in extracted facial landmark-aware features.
arXiv Detail & Related papers (2023-12-09T03:16:09Z) - Implicit Identity Representation Conditioned Memory Compensation Network
for Talking Head video Generation [16.66038865012963]
Talking head video generation aims to animate a human face in a still image with dynamic poses and expressions using motion information.
Still source image cannot provide sufficient appearance information for occluded regions or delicate expression variations.
We propose a novel implicit identity representation conditioned memory compensation network, coined as MCNet, for high-fidelity talking head generation.
arXiv Detail & Related papers (2023-07-19T11:10:26Z) - DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder [55.58582254514431]
We propose DAE-Talker to synthesize full video frames and produce natural head movements that align with the content of speech.
We also introduce pose modelling in speech2latent for pose controllability.
Our experiments show that DAE-Talker outperforms existing popular methods in lip-sync, video fidelity, and pose naturalness.
arXiv Detail & Related papers (2023-03-30T17:18:31Z) - Attention based Occlusion Removal for Hybrid Telepresence Systems [5.006086647446482]
We propose a novel attention-enabled encoder-decoder architecture for HMD de-occlusion.
We report superior qualitative and quantitative results over state-of-the-art methods.
We also present applications of this approach to hybrid video teleconferencing using existing animation and 3D face reconstruction pipelines.
arXiv Detail & Related papers (2021-12-02T10:18:22Z) - Unmasking Communication Partners: A Low-Cost AI Solution for Digitally
Removing Head-Mounted Displays in VR-Based Telepresence [62.997667081978825]
Face-to-face conversation in Virtual Reality (VR) is a challenge when participants wear head-mounted displays (HMD)
Past research has shown that high-fidelity face reconstruction with personal avatars in VR is possible under laboratory conditions with high-cost hardware.
We propose one of the first low-cost systems for this task which uses only open source, free software and affordable hardware.
arXiv Detail & Related papers (2020-11-06T23:17:12Z) - Augment Yourself: Mixed Reality Self-Augmentation Using Optical
See-through Head-mounted Displays and Physical Mirrors [49.49841698372575]
Optical see-though head-mounted displays (OST HMDs) are one of the key technologies for merging virtual objects and physical scenes to provide an immersive mixed reality (MR) environment to its user.
We propose a novel concept and prototype system that combines OST HMDs and physical mirrors to enable self-augmentation and provide an immersive MR environment centered around the user.
Our system, to the best of our knowledge the first of its kind, estimates the user's pose in the virtual image generated by the mirror using an RGBD camera attached to the HMD and anchors virtual objects to the reflection rather
arXiv Detail & Related papers (2020-07-06T16:53:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.