Auto-CARD: Efficient and Robust Codec Avatar Driving for Real-time
Mobile Telepresence
- URL: http://arxiv.org/abs/2304.11835v1
- Date: Mon, 24 Apr 2023 05:45:12 GMT
- Title: Auto-CARD: Efficient and Robust Codec Avatar Driving for Real-time
Mobile Telepresence
- Authors: Yonggan Fu, Yuecheng Li, Chenghui Li, Jason Saragih, Peizhao Zhang,
Xiaoliang Dai, Yingyan Lin
- Abstract summary: We propose a framework called Auto-CARD, which for the first time enables real-time and robust driving of Codec Avatars.
For evaluation, we demonstrate the efficacy of our Auto-CARD framework in real-time Codec Avatar driving settings.
- Score: 27.763047709846713
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-time and robust photorealistic avatars for telepresence in AR/VR have
been highly desired for enabling immersive photorealistic telepresence.
However, there still exists one key bottleneck: the considerable computational
expense needed to accurately infer facial expressions captured from
headset-mounted cameras with a quality level that can match the realism of the
avatar's human appearance. To this end, we propose a framework called
Auto-CARD, which for the first time enables real-time and robust driving of
Codec Avatars when exclusively using merely on-device computing resources. This
is achieved by minimizing two sources of redundancy. First, we develop a
dedicated neural architecture search technique called AVE-NAS for avatar
encoding in AR/VR, which explicitly boosts both the searched architectures'
robustness in the presence of extreme facial expressions and hardware
friendliness on fast evolving AR/VR headsets. Second, we leverage the temporal
redundancy in consecutively captured images during continuous rendering and
develop a mechanism dubbed LATEX to skip the computation of redundant frames.
Specifically, we first identify an opportunity from the linearity of the latent
space derived by the avatar decoder and then propose to perform adaptive latent
extrapolation for redundant frames. For evaluation, we demonstrate the efficacy
of our Auto-CARD framework in real-time Codec Avatar driving settings, where we
achieve a 5.05x speed-up on Meta Quest 2 while maintaining a comparable or even
better animation quality than state-of-the-art avatar encoder designs.
Related papers
- Generalizable and Animatable Gaussian Head Avatar [50.34788590904843]
We propose Generalizable and Animatable Gaussian head Avatar (GAGAvatar) for one-shot animatable head avatar reconstruction.
We generate the parameters of 3D Gaussians from a single image in a single forward pass.
Our method exhibits superior performance compared to previous methods in terms of reconstruction quality and expression accuracy.
arXiv Detail & Related papers (2024-10-10T14:29:00Z) - LightAvatar: Efficient Head Avatar as Dynamic Neural Light Field [58.93692943064746]
We introduce LightAvatar, the first head avatar model based on neural light fields (NeLFs)
LightAvatar renders an image from 3DMM parameters and a camera pose via a single network forward pass, without using mesh or volume rendering.
arXiv Detail & Related papers (2024-09-26T17:00:02Z) - Universal Facial Encoding of Codec Avatars from VR Headsets [32.60236093340087]
We present a method that can animate a photorealistic avatar in realtime from head-mounted cameras (HMCs) on a consumer VR headset.
We present a lightweight expression calibration mechanism that increases accuracy with minimal additional cost to run-time efficiency.
arXiv Detail & Related papers (2024-07-17T22:08:15Z) - MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices [16.489105620313065]
MobilePortrait is a one-shot neural head avatars method that reduces learning complexity by integrating external knowledge into both the motion modeling and image synthesis.
It achieves state-of-the-art performance with less than one-tenth the computational demand.
It has been validated to reach speeds of over 100 FPS on mobile devices and support both video and audio-driven inputs.
arXiv Detail & Related papers (2024-07-08T08:12:57Z) - ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering [62.81677824868519]
We propose an animatable Gaussian splatting approach for photorealistic rendering of dynamic humans in real-time.
We parameterize the clothed human as animatable 3D Gaussians, which can be efficiently splatted into image space to generate the final rendering.
We benchmark ASH with competing methods on pose-controllable avatars, demonstrating that our method outperforms existing real-time methods by a large margin and shows comparable or even better results than offline methods.
arXiv Detail & Related papers (2023-12-10T17:07:37Z) - Real-Time Radiance Fields for Single-Image Portrait View Synthesis [85.32826349697972]
We present a one-shot method to infer and render a 3D representation from a single unposed image in real-time.
Given a single RGB input, our image encoder directly predicts a canonical triplane representation of a neural radiance field for 3D-aware novel view synthesis via volume rendering.
Our method is fast (24 fps) on consumer hardware, and produces higher quality results than strong GAN-inversion baselines that require test-time optimization.
arXiv Detail & Related papers (2023-05-03T17:56:01Z) - InstantAvatar: Learning Avatars from Monocular Video in 60 Seconds [43.41503529747328]
We propose a system that can reconstruct human avatars from a monocular video within seconds, and these avatars can be animated and rendered at an interactive rate.
Compared to existing methods, InstantAvatar converges 130x faster and can be trained in minutes instead of hours.
arXiv Detail & Related papers (2022-12-20T18:53:58Z) - NeRFPlayer: A Streamable Dynamic Scene Representation with Decomposed
Neural Radiance Fields [99.57774680640581]
We present an efficient framework capable of fast reconstruction, compact modeling, and streamable rendering.
We propose to decompose the 4D space according to temporal characteristics. Points in the 4D space are associated with probabilities belonging to three categories: static, deforming, and new areas.
arXiv Detail & Related papers (2022-10-28T07:11:05Z) - Pixel Codec Avatars [99.36561532588831]
Pixel Codec Avatars (PiCA) is a deep generative model of 3D human faces.
On a single Oculus Quest 2 mobile VR headset, 5 avatars are rendered in realtime in the same scene.
arXiv Detail & Related papers (2021-04-09T23:17:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.