Related papers: Expression-aware video inpainting for HMD removal in XR applications

Expression-aware video inpainting for HMD removal in XR applications

URL: http://arxiv.org/abs/2401.14136v1
Date: Thu, 25 Jan 2024 12:32:21 GMT
Title: Expression-aware video inpainting for HMD removal in XR applications
Authors: Fatemeh Ghorbani Lohesara, Karen Egiazarian, Sebastian Knorr
Abstract summary: Head-mounted displays (HMDs) serve as indispensable devices for observing extended reality (XR) environments and virtual content. HMDs present an obstacle to external recording techniques as they block the upper face of the user. We propose a new network for expression-aware video inpainting for HMD removal based on generative adversarial networks (GANs)
Score: 0.27624021966289597
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Head-mounted displays (HMDs) serve as indispensable devices for observing extended reality (XR) environments and virtual content. However, HMDs present an obstacle to external recording techniques as they block the upper face of the user. This limitation significantly affects social XR applications, specifically teleconferencing, where facial features and eye gaze information play a vital role in creating an immersive user experience. In this study, we propose a new network for expression-aware video inpainting for HMD removal (EVI-HRnet) based on generative adversarial networks (GANs). Our model effectively fills in missing information with regard to facial landmarks and a single occlusion-free reference image of the user. The framework and its components ensure the preservation of the user's identity across frames using the reference frame. To further improve the level of realism of the inpainted output, we introduce a novel facial expression recognition (FER) loss function for emotion preservation. Our results demonstrate the remarkable capability of the proposed framework to remove HMDs from facial videos while maintaining the subject's facial expression and identity. Moreover, the outputs exhibit temporal consistency along the inpainted frames. This lightweight framework presents a practical approach for HMD occlusion removal, with the potential to enhance various collaborative XR applications without the need for additional hardware.

Related papers

Show and Polish: Reference-Guided Identity Preservation in Face Video Restoration [9.481604837168762]
Face Video Restoration (FVR) aims to recover high-quality face videos from degraded versions.<n>Traditional methods struggle to preserve fine-grained, identity-specific features when degradation is severe.<n>We introduce IP-FVR, a novel method that leverages a high-quality reference face image as a visual prompt to provide identity conditioning during the denoising process.
arXiv Detail & Related papers (2025-07-14T14:01:37Z)
AffectSRNet : Facial Emotion-Aware Super-Resolution Network [5.295131292624206]
We propose AffectSRNet, a novel emotion-aware super-resolution framework for facial expression recognition. Our method bridges the gap between image resolution and expression accuracy by employing an expression-preserving loss function. We show that AffectSRNet outperforms existing FSR approaches in both visual quality and emotion fidelity.
arXiv Detail & Related papers (2025-02-14T06:02:59Z)
OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration. We propose OSDFace, a novel one-step diffusion model for face restoration. Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z)
Reconstructive Visual Instruction Tuning [64.91373889600136]
reconstructive visual instruction tuning (ROSS) is a family of Large Multimodal Models (LMMs) that exploit vision-centric supervision signals. It reconstructs latent representations of input images, avoiding directly regressing exact raw RGB values. Empirically, ROSS consistently brings significant improvements across different visual encoders and language models.
arXiv Detail & Related papers (2024-10-12T15:54:29Z)
Towards Realistic Landmark-Guided Facial Video Inpainting Based on GANs [0.27624021966289597]
This study introduces a network designed for expression-based video inpainting. It employs generative adversarial networks (GANs) to handle static and moving occlusions across all frames. We further enhance emotional preservation through a customized facial expression recognition (FER) loss function, ensuring detailed inpainted outputs.
arXiv Detail & Related papers (2024-02-14T11:20:47Z)
From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expression Recognition in Videos [88.08209394979178]
Dynamic facial expression recognition (DFER) in the wild is still hindered by data limitations. We introduce a novel Static-to-Dynamic model (S2D) that leverages existing SFER knowledge and dynamic information implicitly encoded in extracted facial landmark-aware features.
arXiv Detail & Related papers (2023-12-09T03:16:09Z)
Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation [16.66038865012963]
Talking head video generation aims to animate a human face in a still image with dynamic poses and expressions using motion information. Still source image cannot provide sufficient appearance information for occluded regions or delicate expression variations. We propose a novel implicit identity representation conditioned memory compensation network, coined as MCNet, for high-fidelity talking head generation.
arXiv Detail & Related papers (2023-07-19T11:10:26Z)
DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder [55.58582254514431]
We propose DAE-Talker to synthesize full video frames and produce natural head movements that align with the content of speech. We also introduce pose modelling in speech2latent for pose controllability. Our experiments show that DAE-Talker outperforms existing popular methods in lip-sync, video fidelity, and pose naturalness.
arXiv Detail & Related papers (2023-03-30T17:18:31Z)
Interactive Face Video Coding: A Generative Compression Framework [18.26476468644723]
We propose a novel framework for Interactive Face Video Coding (IFVC), which allows humans to interact with the intrinsic visual representations instead of the signals. The proposed solution enjoys several distinct advantages, including ultra-compact representation, low delay interaction, and vivid expression and headpose animation.
arXiv Detail & Related papers (2023-02-20T11:24:23Z)
Context-Aware Video Reconstruction for Rolling Shutter Cameras [52.28710992548282]
In this paper, we propose a context-aware GS video reconstruction architecture. We first estimate the bilateral motion field so that the pixels of the two RS frames are warped to a common GS frame. Then, a refinement scheme is proposed to guide the GS frame synthesis along with bilateral occlusion masks to produce high-fidelity GS video frames.
arXiv Detail & Related papers (2022-05-25T17:05:47Z)
Attention based Occlusion Removal for Hybrid Telepresence Systems [5.006086647446482]
We propose a novel attention-enabled encoder-decoder architecture for HMD de-occlusion. We report superior qualitative and quantitative results over state-of-the-art methods. We also present applications of this approach to hybrid video teleconferencing using existing animation and 3D face reconstruction pipelines.
arXiv Detail & Related papers (2021-12-02T10:18:22Z)
Unmasking Communication Partners: A Low-Cost AI Solution for Digitally Removing Head-Mounted Displays in VR-Based Telepresence [62.997667081978825]
Face-to-face conversation in Virtual Reality (VR) is a challenge when participants wear head-mounted displays (HMD) Past research has shown that high-fidelity face reconstruction with personal avatars in VR is possible under laboratory conditions with high-cost hardware. We propose one of the first low-cost systems for this task which uses only open source, free software and affordable hardware.
arXiv Detail & Related papers (2020-11-06T23:17:12Z)
Augment Yourself: Mixed Reality Self-Augmentation Using Optical See-through Head-mounted Displays and Physical Mirrors [49.49841698372575]
Optical see-though head-mounted displays (OST HMDs) are one of the key technologies for merging virtual objects and physical scenes to provide an immersive mixed reality (MR) environment to its user. We propose a novel concept and prototype system that combines OST HMDs and physical mirrors to enable self-augmentation and provide an immersive MR environment centered around the user. Our system, to the best of our knowledge the first of its kind, estimates the user's pose in the virtual image generated by the mirror using an RGBD camera attached to the HMD and anchors virtual objects to the reflection rather
arXiv Detail & Related papers (2020-07-06T16:53:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.