Related papers: MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices

MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices

URL: http://arxiv.org/abs/2407.05712v1
Date: Mon, 8 Jul 2024 08:12:57 GMT
Title: MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices
Authors: Jianwen Jiang, Gaojie Lin, Zhengkun Rong, Chao Liang, Yongming Zhu, Jiaqi Yang, Tianyun Zhong,
Abstract summary: MobilePortrait is a one-shot neural head avatars method that reduces learning complexity by integrating external knowledge into both the motion modeling and image synthesis. It achieves state-of-the-art performance with less than one-tenth the computational demand. It has been validated to reach speeds of over 100 FPS on mobile devices and support both video and audio-driven inputs.
Score: 16.489105620313065
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing neural head avatars methods have achieved significant progress in the image quality and motion range of portrait animation. However, these methods neglect the computational overhead, and to the best of our knowledge, none is designed to run on mobile devices. This paper presents MobilePortrait, a lightweight one-shot neural head avatars method that reduces learning complexity by integrating external knowledge into both the motion modeling and image synthesis, enabling real-time inference on mobile devices. Specifically, we introduce a mixed representation of explicit and implicit keypoints for precise motion modeling and precomputed visual features for enhanced foreground and background synthesis. With these two key designs and using simple U-Nets as backbones, our method achieves state-of-the-art performance with less than one-tenth the computational demand. It has been validated to reach speeds of over 100 FPS on mobile devices and support both video and audio-driven inputs.

Related papers

GUAVA: Generalizable Upper Body 3D Gaussian Avatar [32.476282286315055]
3D human avatar reconstruction typically requires multi-view or monocular videos and training on individual IDs.<n>We first introduce an expressive human model (EHM) to enhance facial expression capabilities.<n>We propose GUAVA, the first framework for fast animatable upper-body 3D Gaussian avatar reconstruction.
arXiv Detail & Related papers (2025-05-06T09:19:16Z)
SqueezeMe: Mobile-Ready Distillation of Gaussian Full-Body Avatars [19.249226899376943]
We present SqueezeMe, a framework to convert high-fidelity 3D Gaussian full-body avatars into a lightweight representation. We achieve, for the first time, simultaneous animation and rendering of 3 Gaussian avatars in real-time (72 FPS) on a Meta Quest 3 VR headset.
arXiv Detail & Related papers (2024-12-19T18:46:55Z)
Universal Facial Encoding of Codec Avatars from VR Headsets [32.60236093340087]
We present a method that can animate a photorealistic avatar in realtime from head-mounted cameras (HMCs) on a consumer VR headset. We present a lightweight expression calibration mechanism that increases accuracy with minimal additional cost to run-time efficiency.
arXiv Detail & Related papers (2024-07-17T22:08:15Z)
Real-Time Simulated Avatar from Head-Mounted Sensors [70.41580295721525]
We present SimXR, a method for controlling a simulated avatar from information (headset pose and cameras) obtained from AR / VR headsets. To synergize headset poses with cameras, we control a humanoid to track headset movement while analyzing input images to decide body movement. When body parts are seen, the movements of hands and feet will be guided by the images; when unseen, the laws of physics guide the controller to generate plausible motion.
arXiv Detail & Related papers (2024-03-11T16:15:51Z)
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations [107.88375243135579]
Given speech audio, we output multiple possibilities of gestural motion for an individual, including face, body, and hands. We visualize the generated motion using highly photorealistic avatars that can express crucial nuances in gestures. Experiments show our model generates appropriate and diverse gestures, outperforming both diffusion- and VQ-only methods.
arXiv Detail & Related papers (2024-01-03T18:55:16Z)
Real-time volumetric rendering of dynamic humans [83.08068677139822]
We present a method for fast 3D reconstruction and real-time rendering of dynamic humans from monocular videos. Our method can reconstruct a dynamic human in less than 3h using a single GPU, compared to recent state-of-the-art alternatives that take up to 72h. A novel local ray marching rendering allows visualizing the neural human on a mobile VR device at 40 frames per second with minimal loss of visual quality.
arXiv Detail & Related papers (2023-03-21T14:41:25Z)
MegaPortraits: One-shot Megapixel Neural Head Avatars [7.05068904295608]
We propose a set of new neural architectures and training methods that can leverage both medium-resolution video data and high-resolution image data. We show how a trained high-resolution neural avatar model can be distilled into a lightweight student model which runs in real-time. Real-time operation and identity lock are essential for many practical applications head avatar systems.
arXiv Detail & Related papers (2022-07-15T17:32:37Z)
Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation [61.8546794105462]
We propose Semantic-aware Speaking Portrait NeRF (SSP-NeRF), which creates delicate audio-driven portraits using one unified set of NeRF. We first propose a Semantic-Aware Dynamic Ray Sampling module with an additional parsing branch that facilitates audio-driven volume rendering. To enable portrait rendering in one unified neural radiance field, a Torso Deformation module is designed to stabilize the large-scale non-rigid torso motions.
arXiv Detail & Related papers (2022-01-19T18:54:41Z)
PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering [56.762094966235566]
A Portrait Image Neural Renderer is proposed to control the face motions with the parameters of three-dimensional morphable face models. The proposed model can generate photo-realistic portrait images with accurate movements according to intuitive modifications. Our model can generate coherent videos with convincing movements from only a single reference image and a driving audio stream.
arXiv Detail & Related papers (2021-09-17T07:24:16Z)
Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering [34.80975358673563]
We propose a novel approach that learns generalizable neural radiance fields based on a parametric human body model for robust performance capture. Experiments on the ZJU-MoCap and AIST datasets show that our method significantly outperforms recent generalizable NeRF methods on unseen identities and poses.
arXiv Detail & Related papers (2021-09-15T17:32:46Z)
Pixel Codec Avatars [99.36561532588831]
Pixel Codec Avatars (PiCA) is a deep generative model of 3D human faces. On a single Oculus Quest 2 mobile VR headset, 5 avatars are rendered in realtime in the same scene.
arXiv Detail & Related papers (2021-04-09T23:17:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.