BAG: Body-Aligned 3D Wearable Asset Generation
- URL: http://arxiv.org/abs/2501.16177v1
- Date: Mon, 27 Jan 2025 16:23:45 GMT
- Title: BAG: Body-Aligned 3D Wearable Asset Generation
- Authors: Zhongjin Luo, Yang Li, Mingrui Zhang, Senbo Wang, Han Yan, Xibin Song, Taizhang Shang, Wei Mao, Hongdong Li, Xiaoguang Han, Pan Ji,
- Abstract summary: Bag is a Body-aligned Asset Generation method to output 3D wearable asset that can be automatically dressed on given 3D human bodies.
Results demonstrate significant advantages over existing methods in terms of image prompt-following capability, shape diversity, and shape quality.
- Score: 59.7545477546307
- License:
- Abstract: While recent advancements have shown remarkable progress in general 3D shape generation models, the challenge of leveraging these approaches to automatically generate wearable 3D assets remains unexplored. To this end, we present BAG, a Body-aligned Asset Generation method to output 3D wearable asset that can be automatically dressed on given 3D human bodies. This is achived by controlling the 3D generation process using human body shape and pose information. Specifically, we first build a general single-image to consistent multiview image diffusion model, and train it on the large Objaverse dataset to achieve diversity and generalizability. Then we train a Controlnet to guide the multiview generator to produce body-aligned multiview images. The control signal utilizes the multiview 2D projections of the target human body, where pixel values represent the XYZ coordinates of the body surface in a canonical space. The body-conditioned multiview diffusion generates body-aligned multiview images, which are then fed into a native 3D diffusion model to produce the 3D shape of the asset. Finally, by recovering the similarity transformation using multiview silhouette supervision and addressing asset-body penetration with physics simulators, the 3D asset can be accurately fitted onto the target human body. Experimental results demonstrate significant advantages over existing methods in terms of image prompt-following capability, shape diversity, and shape quality. Our project page is available at https://bag-3d.github.io/.
Related papers
- Sharp-It: A Multi-view to Multi-view Diffusion Model for 3D Synthesis and Manipulation [15.215597253086612]
We bridge the quality gap between methods that directly generate 3D representations and ones that reconstruct 3D objects from multi-view images.
We introduce a multi-view to multi-view diffusion model called Sharp-It, which takes a 3D consistent set of multi-view images.
We demonstrate that Sharp-It enables various 3D applications, such as fast synthesis, editing, and controlled generation, while attaining high-quality assets.
arXiv Detail & Related papers (2024-12-03T17:58:07Z) - ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance [76.7746870349809]
We present ComboVerse, a 3D generation framework that produces high-quality 3D assets with complex compositions by learning to combine multiple models.
Our proposed framework emphasizes spatial alignment of objects, compared with standard score distillation sampling.
arXiv Detail & Related papers (2024-03-19T03:39:43Z) - 3D Human Reconstruction in the Wild with Synthetic Data Using Generative Models [52.96248836582542]
We propose an effective approach based on recent diffusion models, termed HumanWild, which can effortlessly generate human images and corresponding 3D mesh annotations.
By exclusively employing generative models, we generate large-scale in-the-wild human images and high-quality annotations, eliminating the need for real-world data collection.
arXiv Detail & Related papers (2024-03-17T06:31:16Z) - Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model.
Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach.
These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z) - Get3DHuman: Lifting StyleGAN-Human into a 3D Generative Model using
Pixel-aligned Reconstruction Priors [56.192682114114724]
Get3DHuman is a novel 3D human framework that can significantly boost the realism and diversity of the generated outcomes.
Our key observation is that the 3D generator can profit from human-related priors learned through 2D human generators and 3D reconstructors.
arXiv Detail & Related papers (2023-02-02T15:37:46Z) - 3DHumanGAN: 3D-Aware Human Image Generation with 3D Pose Mapping [37.14866512377012]
3DHumanGAN is a 3D-aware generative adversarial network that synthesizes photorealistic images of full-body humans.
We propose a novel generator architecture in which a 2D convolutional backbone is modulated by a 3D pose mapping network.
arXiv Detail & Related papers (2022-12-14T17:59:03Z) - EVA3D: Compositional 3D Human Generation from 2D Image Collections [27.70991135165909]
EVA3D is an unconditional 3D human generative model learned from 2D image collections only.
It can sample 3D humans with detailed geometry and render high-quality images (up to 512x256) without bells and whistles.
It achieves state-of-the-art 3D human generation performance regarding both geometry and texture quality.
arXiv Detail & Related papers (2022-10-10T17:59:31Z) - Human Performance Capture from Monocular Video in the Wild [50.34917313325813]
We propose a method capable of capturing the dynamic 3D human shape from a monocular video featuring challenging body poses.
Our method outperforms state-of-the-art methods on an in-the-wild human video dataset 3DPW.
arXiv Detail & Related papers (2021-11-29T16:32:41Z) - Learning Temporal 3D Human Pose Estimation with Pseudo-Labels [3.0954251281114513]
We present a simple, yet effective, approach for self-supervised 3D human pose estimation.
We rely on triangulating 2D body pose estimates of a multiple-view camera system.
Our method achieves state-of-the-art performance in the Human3.6M and MPI-INF-3DHP benchmarks.
arXiv Detail & Related papers (2021-10-14T17:40:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.