InstantAvatar: Learning Avatars from Monocular Video in 60 Seconds
- URL: http://arxiv.org/abs/2212.10550v1
- Date: Tue, 20 Dec 2022 18:53:58 GMT
- Title: InstantAvatar: Learning Avatars from Monocular Video in 60 Seconds
- Authors: Tianjian Jiang, Xu Chen, Jie Song, Otmar Hilliges
- Abstract summary: We propose a system that can reconstruct human avatars from a monocular video within seconds, and these avatars can be animated and rendered at an interactive rate.
Compared to existing methods, InstantAvatar converges 130x faster and can be trained in minutes instead of hours.
- Score: 43.41503529747328
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we take a significant step towards real-world applicability of
monocular neural avatar reconstruction by contributing InstantAvatar, a system
that can reconstruct human avatars from a monocular video within seconds, and
these avatars can be animated and rendered at an interactive rate. To achieve
this efficiency we propose a carefully designed and engineered system, that
leverages emerging acceleration structures for neural fields, in combination
with an efficient empty space-skipping strategy for dynamic scenes. We also
contribute an efficient implementation that we will make available for research
purposes. Compared to existing methods, InstantAvatar converges 130x faster and
can be trained in minutes instead of hours. It achieves comparable or even
better reconstruction quality and novel pose synthesis results. When given the
same time budget, our method significantly outperforms SoTA methods.
InstantAvatar can yield acceptable visual quality in as little as 10 seconds
training time.
Related papers
- InstantGeoAvatar: Effective Geometry and Appearance Modeling of Animatable Avatars from Monocular Video [19.14732665400654]
We present InstantGeoAvatar, a method for efficient and effective learning from monocular video of detailed 3D geometry.
We propose a principled geometry-aware SDF regularization scheme that seamlessly fits into the volume rendering pipeline.
We obtain competitive results in geometry reconstruction and novel view synthesis in as little as five minutes of training time.
arXiv Detail & Related papers (2024-11-03T10:26:33Z) - Efficient Neural Implicit Representation for 3D Human Reconstruction [38.241511336562844]
Conventional methods for reconstructing 3D human motion frequently require the use of expensive hardware and have high processing costs.
This study presents HumanAvatar, an innovative approach that efficiently reconstructs precise human avatars from monocular video sources.
arXiv Detail & Related papers (2024-10-23T10:16:01Z) - LightAvatar: Efficient Head Avatar as Dynamic Neural Light Field [58.93692943064746]
We introduce LightAvatar, the first head avatar model based on neural light fields (NeLFs)
LightAvatar renders an image from 3DMM parameters and a camera pose via a single network forward pass, without using mesh or volume rendering.
arXiv Detail & Related papers (2024-09-26T17:00:02Z) - 3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting [32.63571465495127]
We introduce an approach that creates animatable human avatars from monocular videos using 3D Gaussian Splatting (3DGS)
We learn a non-rigid network to reconstruct animatable clothed human avatars that can be trained within 30 minutes and rendered at real-time frame rates (50+ FPS)
Experimental results show that our method achieves comparable and even better performance compared to state-of-the-art approaches on animatable avatar creation from a monocular input.
arXiv Detail & Related papers (2023-12-14T18:54:32Z) - Animatable 3D Gaussian: Fast and High-Quality Reconstruction of Multiple Human Avatars [18.55354901614876]
We propose Animatable 3D Gaussian, which learns human avatars from input images and poses.
On both novel view synthesis and novel pose synthesis tasks, our method achieves higher reconstruction quality than InstantAvatar with less training time.
Our method can be easily extended to multi-human scenes and achieve comparable novel view synthesis results on a scene with ten people in only 25 seconds of training.
arXiv Detail & Related papers (2023-11-27T08:17:09Z) - Auto-CARD: Efficient and Robust Codec Avatar Driving for Real-time
Mobile Telepresence [27.763047709846713]
We propose a framework called Auto-CARD, which for the first time enables real-time and robust driving of Codec Avatars.
For evaluation, we demonstrate the efficacy of our Auto-CARD framework in real-time Codec Avatar driving settings.
arXiv Detail & Related papers (2023-04-24T05:45:12Z) - Efficient Meshy Neural Fields for Animatable Human Avatars [87.68529918184494]
Efficiently digitizing high-fidelity animatable human avatars from videos is a challenging and active research topic.
Recent rendering-based neural representations open a new way for human digitization with their friendly usability and photo-varying reconstruction quality.
We present EMA, a method that Efficiently learns Meshy neural fields to reconstruct animatable human Avatars.
arXiv Detail & Related papers (2023-03-23T00:15:34Z) - Real-time volumetric rendering of dynamic humans [83.08068677139822]
We present a method for fast 3D reconstruction and real-time rendering of dynamic humans from monocular videos.
Our method can reconstruct a dynamic human in less than 3h using a single GPU, compared to recent state-of-the-art alternatives that take up to 72h.
A novel local ray marching rendering allows visualizing the neural human on a mobile VR device at 40 frames per second with minimal loss of visual quality.
arXiv Detail & Related papers (2023-03-21T14:41:25Z) - PointAvatar: Deformable Point-based Head Avatars from Videos [103.43941945044294]
PointAvatar is a deformable point-based representation that disentangles the source color into intrinsic albedo and normal-dependent shading.
We show that our method is able to generate animatable 3D avatars using monocular videos from multiple sources.
arXiv Detail & Related papers (2022-12-16T10:05:31Z) - NeRFPlayer: A Streamable Dynamic Scene Representation with Decomposed
Neural Radiance Fields [99.57774680640581]
We present an efficient framework capable of fast reconstruction, compact modeling, and streamable rendering.
We propose to decompose the 4D space according to temporal characteristics. Points in the 4D space are associated with probabilities belonging to three categories: static, deforming, and new areas.
arXiv Detail & Related papers (2022-10-28T07:11:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.