Egocentric Videoconferencing
- URL: http://arxiv.org/abs/2107.03109v1
- Date: Wed, 7 Jul 2021 09:49:39 GMT
- Title: Egocentric Videoconferencing
- Authors: Mohamed Elgharib, Mohit Mendiratta, Justus Thies, Matthias
Nie{\ss}ner, Hans-Peter Seidel, Ayush Tewari, Vladislav Golyanik, Christian
Theobalt
- Abstract summary: Videoconferencing portrays valuable non-verbal communication and face expression cues, but usually requires a front-facing camera.
We propose a low-cost wearable egocentric camera setup that can be integrated into smart glasses.
Our goal is to mimic a classical video call, and therefore, we transform the egocentric perspective of this camera into a front facing video.
- Score: 86.88092499544706
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a method for egocentric videoconferencing that enables
hands-free video calls, for instance by people wearing smart glasses or other
mixed-reality devices. Videoconferencing portrays valuable non-verbal
communication and face expression cues, but usually requires a front-facing
camera. Using a frontal camera in a hands-free setting when a person is on the
move is impractical. Even holding a mobile phone camera in the front of the
face while sitting for a long duration is not convenient. To overcome these
issues, we propose a low-cost wearable egocentric camera setup that can be
integrated into smart glasses. Our goal is to mimic a classical video call, and
therefore, we transform the egocentric perspective of this camera into a front
facing video. To this end, we employ a conditional generative adversarial
neural network that learns a transition from the highly distorted egocentric
views to frontal views common in videoconferencing. Our approach learns to
transfer expression details directly from the egocentric view without using a
complex intermediate parametric expressions model, as it is used by related
face reenactment methods. We successfully handle subtle expressions, not easily
captured by parametric blendshape-based solutions, e.g., tongue movement, eye
movements, eye blinking, strong expressions and depth varying movements. To get
control over the rigid head movements in the target view, we condition the
generator on synthetic renderings of a moving neutral face. This allows us to
synthesis results at different head poses. Our technique produces temporally
smooth video-realistic renderings in real-time using a video-to-video
translation network in conjunction with a temporal discriminator. We
demonstrate the improved capabilities of our technique by comparing against
related state-of-the art approaches.
Related papers
- Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a
Short Video [91.92782707888618]
We present a decomposition-composition framework named Speech to Lip (Speech2Lip) that disentangles speech-sensitive and speech-insensitive motion/appearance.
We show that our model can be trained by a video of just a few minutes in length and achieve state-of-the-art performance in both visual quality and speech-visual synchronization.
arXiv Detail & Related papers (2023-09-09T14:52:39Z) - Continuously Controllable Facial Expression Editing in Talking Face
Videos [34.83353695337335]
Speech-related expressions and emotion-related expressions are often highly coupled.
Traditional image-to-image translation methods cannot work well in our application.
We propose a high-quality facial expression editing method for talking face videos.
arXiv Detail & Related papers (2022-09-17T09:05:47Z) - StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation [47.06075725469252]
StyleTalker is an audio-driven talking head generation model.
It can synthesize a video of a talking person from a single reference image.
Our model is able to synthesize talking head videos with impressive perceptual quality.
arXiv Detail & Related papers (2022-08-23T12:49:01Z) - Watch Those Words: Video Falsification Detection Using Word-Conditioned
Facial Motion [82.06128362686445]
We propose a multi-modal semantic forensic approach to handle both cheapfakes and visually persuasive deepfakes.
We leverage the idea of attribution to learn person-specific biometric patterns that distinguish a given speaker from others.
Unlike existing person-specific approaches, our method is also effective against attacks that focus on lip manipulation.
arXiv Detail & Related papers (2021-12-21T01:57:04Z) - Pose-Controllable Talking Face Generation by Implicitly Modularized
Audio-Visual Representation [96.66010515343106]
We propose a clean yet effective framework to generate pose-controllable talking faces.
We operate on raw face images, using only a single photo as an identity reference.
Our model has multiple advanced capabilities including extreme view robustness and talking face frontalization.
arXiv Detail & Related papers (2021-04-22T15:10:26Z) - Audio- and Gaze-driven Facial Animation of Codec Avatars [149.0094713268313]
We describe the first approach to animate Codec Avatars in real-time using audio and/or eye tracking.
Our goal is to display expressive conversations between individuals that exhibit important social signals.
arXiv Detail & Related papers (2020-08-11T22:28:48Z) - Talking-head Generation with Rhythmic Head Motion [46.6897675583319]
We propose a 3D-aware generative network with a hybrid embedding module and a non-linear composition module.
Our approach achieves controllable, photo-realistic, and temporally coherent talking-head videos with natural head movements.
arXiv Detail & Related papers (2020-07-16T18:13:40Z) - Audio-driven Talking Face Video Generation with Learning-based
Personalized Head Pose [67.31838207805573]
We propose a deep neural network model that takes an audio signal A of a source person and a short video V of a target person as input.
We outputs a synthesized high-quality talking face video with personalized head pose.
Our method can generate high-quality talking face videos with more distinguishing head movement effects than state-of-the-art methods.
arXiv Detail & Related papers (2020-02-24T10:02:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.