WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation
- URL: http://arxiv.org/abs/2407.02165v3
- Date: Sun, 14 Jul 2024 08:15:12 GMT
- Title: WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation
- Authors: Zihao Huang, Shoukang Hu, Guangcong Wang, Tianqi Liu, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu,
- Abstract summary: WildAvatar is a web-scale in-the-wild human avatar creation dataset extracted from YouTube.
We evaluate several state-of-the-art avatar creation methods on our dataset, highlighting the unexplored challenges in real-world applications on avatar creation.
- Score: 55.85887047136534
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing human datasets for avatar creation are typically limited to laboratory environments, wherein high-quality annotations (e.g., SMPL estimation from 3D scans or multi-view images) can be ideally provided. However, their annotating requirements are impractical for real-world images or videos, posing challenges toward real-world applications on current avatar creation methods. To this end, we propose the WildAvatar dataset, a web-scale in-the-wild human avatar creation dataset extracted from YouTube, with $10,000+$ different human subjects and scenes. WildAvatar is at least $10\times$ richer than previous datasets for 3D human avatar creation. We evaluate several state-of-the-art avatar creation methods on our dataset, highlighting the unexplored challenges in real-world applications on avatar creation. We also demonstrate the potential for generalizability of avatar creation methods, when provided with data at scale. We publicly release our data source links and annotations, to push forward 3D human avatar creation and other related fields for real-world applications.
Related papers
- Multimodal Generation of Animatable 3D Human Models with AvatarForge [67.31920821192323]
AvatarForge is a framework for generating animatable 3D human avatars from text or image inputs using AI-driven procedural generation.
Our evaluations show that AvatarForge outperforms state-of-the-art methods in both text- and image-to-avatar generation.
arXiv Detail & Related papers (2025-03-11T08:29:18Z) - Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior [31.780579293685797]
We present Vid2Avatar-Pro, a method to create photorealistic and animatable 3D human avatars from monocular in-the-wild videos.
arXiv Detail & Related papers (2025-03-03T14:45:35Z) - PuzzleAvatar: Assembling 3D Avatars from Personal Albums [54.831084076478874]
We develop PuzzleAvatar, a novel model that generates a faithful 3D avatar from a personal OOTD album.
We exploit the learned tokens as "puzzle pieces" from which we assemble a faithful, personalized 3D avatar.
arXiv Detail & Related papers (2024-05-23T17:59:56Z) - Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion [39.456643736018435]
We propose a novel agent-based approach named Motion Avatar, which allows for the automatic generation of high-quality customizable human and animal avatars.
Secondly, we introduced a LLM planner that coordinates both motion and avatar generation, which transforms a discriminative planning into a customizable Q&A fashion.
Finally, we presented an animal motion dataset named Zoo-300K, comprising approximately 300,000 text-motion pairs across 65 animal categories.
arXiv Detail & Related papers (2024-05-18T13:21:14Z) - Text2Avatar: Text to 3D Human Avatar Generation with Codebook-Driven
Body Controllable Attribute [33.330629835556664]
We propose Text2Avatar, which can generate realistic-style 3D avatars based on the coupled text prompts.
To alleviate the scarcity of realistic style 3D human avatar data, we utilize a pre-trained unconditional 3D human avatar generation model.
arXiv Detail & Related papers (2024-01-01T09:39:57Z) - AvatarStudio: High-fidelity and Animatable 3D Avatar Creation from Text [71.09533176800707]
AvatarStudio is a coarse-to-fine generative model that generates explicit textured 3D meshes for animatable human avatars.
By effectively leveraging the synergy between the articulated mesh representation and the DensePose-conditional diffusion model, AvatarStudio can create high-quality avatars.
arXiv Detail & Related papers (2023-11-29T18:59:32Z) - DreamWaltz: Make a Scene with Complex 3D Animatable Avatars [68.49935994384047]
We present DreamWaltz, a novel framework for generating and animating complex 3D avatars given text guidance and parametric human body prior.
For animation, our method learns an animatable 3D avatar representation from abundant image priors of diffusion model conditioned on various poses.
arXiv Detail & Related papers (2023-05-21T17:59:39Z) - AvatarCraft: Transforming Text into Neural Human Avatars with
Parameterized Shape and Pose Control [38.959851274747145]
AvatarCraft is a method for creating a 3D human avatar with a specific identity and artistic style that can be easily animated.
We use diffusion models to guide the learning of geometry and texture for a neural avatar based on a single text prompt.
We make the human avatar animatable by deforming the neural implicit field with an explicit warping field.
arXiv Detail & Related papers (2023-03-30T17:59:59Z) - PointAvatar: Deformable Point-based Head Avatars from Videos [103.43941945044294]
PointAvatar is a deformable point-based representation that disentangles the source color into intrinsic albedo and normal-dependent shading.
We show that our method is able to generate animatable 3D avatars using monocular videos from multiple sources.
arXiv Detail & Related papers (2022-12-16T10:05:31Z) - AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars [37.43588165101838]
AvatarCLIP is a zero-shot text-driven framework for 3D avatar generation and animation.
We take advantage of the powerful vision-language model CLIP for supervising neural human generation.
By leveraging the priors learned in the motion VAE, a CLIP-guided reference-based motion synthesis method is proposed for the animation of the generated 3D avatar.
arXiv Detail & Related papers (2022-05-17T17:59:19Z) - MVP-Human Dataset for 3D Human Avatar Reconstruction from Unconstrained
Frames [59.37430649840777]
We present 3D Avatar Reconstruction in the wild (ARwild), which first reconstructs the implicit skinning fields in a multi-level manner.
We contribute a large-scale dataset, MVP-Human, which contains 400 subjects, each of which has 15 scans in different poses.
Overall, benefits from the specific network architecture and the diverse data, the trained model enables 3D avatar reconstruction from unconstrained frames.
arXiv Detail & Related papers (2022-04-24T03:57:59Z) - StylePeople: A Generative Model of Fullbody Human Avatars [59.42166744151461]
We propose a new type of full-body human avatars, which combines parametric mesh-based body model with a neural texture.
We show that such avatars can successfully model clothing and hair, which usually poses a problem for mesh-based approaches.
We then propose a generative model for such avatars that can be trained from datasets of images and videos of people.
arXiv Detail & Related papers (2021-04-16T20:43:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.