WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation
- URL: http://arxiv.org/abs/2407.02165v3
- Date: Sun, 14 Jul 2024 08:15:12 GMT
- Title: WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation
- Authors: Zihao Huang, Shoukang Hu, Guangcong Wang, Tianqi Liu, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu,
- Abstract summary: WildAvatar is a web-scale in-the-wild human avatar creation dataset extracted from YouTube.
We evaluate several state-of-the-art avatar creation methods on our dataset, highlighting the unexplored challenges in real-world applications on avatar creation.
- Score: 55.85887047136534
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing human datasets for avatar creation are typically limited to laboratory environments, wherein high-quality annotations (e.g., SMPL estimation from 3D scans or multi-view images) can be ideally provided. However, their annotating requirements are impractical for real-world images or videos, posing challenges toward real-world applications on current avatar creation methods. To this end, we propose the WildAvatar dataset, a web-scale in-the-wild human avatar creation dataset extracted from YouTube, with $10,000+$ different human subjects and scenes. WildAvatar is at least $10\times$ richer than previous datasets for 3D human avatar creation. We evaluate several state-of-the-art avatar creation methods on our dataset, highlighting the unexplored challenges in real-world applications on avatar creation. We also demonstrate the potential for generalizability of avatar creation methods, when provided with data at scale. We publicly release our data source links and annotations, to push forward 3D human avatar creation and other related fields for real-world applications.
Related papers
- PuzzleAvatar: Assembling 3D Avatars from Personal Albums [54.831084076478874]
We develop PuzzleAvatar, a novel model that generates a faithful 3D avatar from a personal OOTD album.
We exploit the learned tokens as "puzzle pieces" from which we assemble a faithful, personalized 3D avatar.
arXiv Detail & Related papers (2024-05-23T17:59:56Z) - Text2Avatar: Text to 3D Human Avatar Generation with Codebook-Driven
Body Controllable Attribute [33.330629835556664]
We propose Text2Avatar, which can generate realistic-style 3D avatars based on the coupled text prompts.
To alleviate the scarcity of realistic style 3D human avatar data, we utilize a pre-trained unconditional 3D human avatar generation model.
arXiv Detail & Related papers (2024-01-01T09:39:57Z) - AvatarStudio: High-fidelity and Animatable 3D Avatar Creation from Text [71.09533176800707]
AvatarStudio is a coarse-to-fine generative model that generates explicit textured 3D meshes for animatable human avatars.
By effectively leveraging the synergy between the articulated mesh representation and the DensePose-conditional diffusion model, AvatarStudio can create high-quality avatars.
arXiv Detail & Related papers (2023-11-29T18:59:32Z) - DreamWaltz: Make a Scene with Complex 3D Animatable Avatars [68.49935994384047]
We present DreamWaltz, a novel framework for generating and animating complex 3D avatars given text guidance and parametric human body prior.
For animation, our method learns an animatable 3D avatar representation from abundant image priors of diffusion model conditioned on various poses.
arXiv Detail & Related papers (2023-05-21T17:59:39Z) - AvatarCraft: Transforming Text into Neural Human Avatars with
Parameterized Shape and Pose Control [38.959851274747145]
AvatarCraft is a method for creating a 3D human avatar with a specific identity and artistic style that can be easily animated.
We use diffusion models to guide the learning of geometry and texture for a neural avatar based on a single text prompt.
We make the human avatar animatable by deforming the neural implicit field with an explicit warping field.
arXiv Detail & Related papers (2023-03-30T17:59:59Z) - AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars [37.43588165101838]
AvatarCLIP is a zero-shot text-driven framework for 3D avatar generation and animation.
We take advantage of the powerful vision-language model CLIP for supervising neural human generation.
By leveraging the priors learned in the motion VAE, a CLIP-guided reference-based motion synthesis method is proposed for the animation of the generated 3D avatar.
arXiv Detail & Related papers (2022-05-17T17:59:19Z) - MVP-Human Dataset for 3D Human Avatar Reconstruction from Unconstrained
Frames [59.37430649840777]
We present 3D Avatar Reconstruction in the wild (ARwild), which first reconstructs the implicit skinning fields in a multi-level manner.
We contribute a large-scale dataset, MVP-Human, which contains 400 subjects, each of which has 15 scans in different poses.
Overall, benefits from the specific network architecture and the diverse data, the trained model enables 3D avatar reconstruction from unconstrained frames.
arXiv Detail & Related papers (2022-04-24T03:57:59Z) - StylePeople: A Generative Model of Fullbody Human Avatars [59.42166744151461]
We propose a new type of full-body human avatars, which combines parametric mesh-based body model with a neural texture.
We show that such avatars can successfully model clothing and hair, which usually poses a problem for mesh-based approaches.
We then propose a generative model for such avatars that can be trained from datasets of images and videos of people.
arXiv Detail & Related papers (2021-04-16T20:43:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.