Expression-aware video inpainting for HMD removal in XR applications
- URL: http://arxiv.org/abs/2401.14136v1
- Date: Thu, 25 Jan 2024 12:32:21 GMT
- Title: Expression-aware video inpainting for HMD removal in XR applications
- Authors: Fatemeh Ghorbani Lohesara, Karen Egiazarian, Sebastian Knorr
- Abstract summary: Head-mounted displays (HMDs) serve as indispensable devices for observing extended reality (XR) environments and virtual content.
HMDs present an obstacle to external recording techniques as they block the upper face of the user.
We propose a new network for expression-aware video inpainting for HMD removal based on generative adversarial networks (GANs)
- Score: 0.27624021966289597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Head-mounted displays (HMDs) serve as indispensable devices for observing
extended reality (XR) environments and virtual content. However, HMDs present
an obstacle to external recording techniques as they block the upper face of
the user. This limitation significantly affects social XR applications,
specifically teleconferencing, where facial features and eye gaze information
play a vital role in creating an immersive user experience. In this study, we
propose a new network for expression-aware video inpainting for HMD removal
(EVI-HRnet) based on generative adversarial networks (GANs). Our model
effectively fills in missing information with regard to facial landmarks and a
single occlusion-free reference image of the user. The framework and its
components ensure the preservation of the user's identity across frames using
the reference frame. To further improve the level of realism of the inpainted
output, we introduce a novel facial expression recognition (FER) loss function
for emotion preservation. Our results demonstrate the remarkable capability of
the proposed framework to remove HMDs from facial videos while maintaining the
subject's facial expression and identity. Moreover, the outputs exhibit
temporal consistency along the inpainted frames. This lightweight framework
presents a practical approach for HMD occlusion removal, with the potential to
enhance various collaborative XR applications without the need for additional
hardware.
Related papers
- OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration.
We propose OSDFace, a novel one-step diffusion model for face restoration.
Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z) - Reconstructive Visual Instruction Tuning [64.91373889600136]
reconstructive visual instruction tuning (ROSS) is a family of Large Multimodal Models (LMMs) that exploit vision-centric supervision signals.
It reconstructs latent representations of input images, avoiding directly regressing exact raw RGB values.
Empirically, ROSS consistently brings significant improvements across different visual encoders and language models.
arXiv Detail & Related papers (2024-10-12T15:54:29Z) - Towards Realistic Landmark-Guided Facial Video Inpainting Based on GANs [0.27624021966289597]
This study introduces a network designed for expression-based video inpainting.
It employs generative adversarial networks (GANs) to handle static and moving occlusions across all frames.
We further enhance emotional preservation through a customized facial expression recognition (FER) loss function, ensuring detailed inpainted outputs.
arXiv Detail & Related papers (2024-02-14T11:20:47Z) - From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expression Recognition in Videos [88.08209394979178]
Dynamic facial expression recognition (DFER) in the wild is still hindered by data limitations.
We introduce a novel Static-to-Dynamic model (S2D) that leverages existing SFER knowledge and dynamic information implicitly encoded in extracted facial landmark-aware features.
arXiv Detail & Related papers (2023-12-09T03:16:09Z) - Implicit Identity Representation Conditioned Memory Compensation Network
for Talking Head video Generation [16.66038865012963]
Talking head video generation aims to animate a human face in a still image with dynamic poses and expressions using motion information.
Still source image cannot provide sufficient appearance information for occluded regions or delicate expression variations.
We propose a novel implicit identity representation conditioned memory compensation network, coined as MCNet, for high-fidelity talking head generation.
arXiv Detail & Related papers (2023-07-19T11:10:26Z) - Interactive Face Video Coding: A Generative Compression Framework [18.26476468644723]
We propose a novel framework for Interactive Face Video Coding (IFVC), which allows humans to interact with the intrinsic visual representations instead of the signals.
The proposed solution enjoys several distinct advantages, including ultra-compact representation, low delay interaction, and vivid expression and headpose animation.
arXiv Detail & Related papers (2023-02-20T11:24:23Z) - Context-Aware Video Reconstruction for Rolling Shutter Cameras [52.28710992548282]
In this paper, we propose a context-aware GS video reconstruction architecture.
We first estimate the bilateral motion field so that the pixels of the two RS frames are warped to a common GS frame.
Then, a refinement scheme is proposed to guide the GS frame synthesis along with bilateral occlusion masks to produce high-fidelity GS video frames.
arXiv Detail & Related papers (2022-05-25T17:05:47Z) - Attention based Occlusion Removal for Hybrid Telepresence Systems [5.006086647446482]
We propose a novel attention-enabled encoder-decoder architecture for HMD de-occlusion.
We report superior qualitative and quantitative results over state-of-the-art methods.
We also present applications of this approach to hybrid video teleconferencing using existing animation and 3D face reconstruction pipelines.
arXiv Detail & Related papers (2021-12-02T10:18:22Z) - Unmasking Communication Partners: A Low-Cost AI Solution for Digitally
Removing Head-Mounted Displays in VR-Based Telepresence [62.997667081978825]
Face-to-face conversation in Virtual Reality (VR) is a challenge when participants wear head-mounted displays (HMD)
Past research has shown that high-fidelity face reconstruction with personal avatars in VR is possible under laboratory conditions with high-cost hardware.
We propose one of the first low-cost systems for this task which uses only open source, free software and affordable hardware.
arXiv Detail & Related papers (2020-11-06T23:17:12Z) - Augment Yourself: Mixed Reality Self-Augmentation Using Optical
See-through Head-mounted Displays and Physical Mirrors [49.49841698372575]
Optical see-though head-mounted displays (OST HMDs) are one of the key technologies for merging virtual objects and physical scenes to provide an immersive mixed reality (MR) environment to its user.
We propose a novel concept and prototype system that combines OST HMDs and physical mirrors to enable self-augmentation and provide an immersive MR environment centered around the user.
Our system, to the best of our knowledge the first of its kind, estimates the user's pose in the virtual image generated by the mirror using an RGBD camera attached to the HMD and anchors virtual objects to the reflection rather
arXiv Detail & Related papers (2020-07-06T16:53:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.