IPVTON: Image-based 3D Virtual Try-on with Image Prompt Adapter
- URL: http://arxiv.org/abs/2501.15616v1
- Date: Sun, 26 Jan 2025 17:51:03 GMT
- Title: IPVTON: Image-based 3D Virtual Try-on with Image Prompt Adapter
- Authors: Xiaojing Zhong, Zhonghua Wu, Xiaofeng Yang, Guosheng Lin, Qingyao Wu,
- Abstract summary: Given a pair of images depicting a person and a garment separately, image-based 3D virtual try-on methods aim to reconstruct a 3D human model.
We present IPVTON, a novel image-based 3D virtual try-on framework.
- Score: 64.03091978606952
- License:
- Abstract: Given a pair of images depicting a person and a garment separately, image-based 3D virtual try-on methods aim to reconstruct a 3D human model that realistically portrays the person wearing the desired garment. In this paper, we present IPVTON, a novel image-based 3D virtual try-on framework. IPVTON employs score distillation sampling with image prompts to optimize a hybrid 3D human representation, integrating target garment features into diffusion priors through an image prompt adapter. To avoid interference with non-target areas, we leverage mask-guided image prompt embeddings to focus the image features on the try-on regions. Moreover, we impose geometric constraints on the 3D model with a pseudo silhouette generated by ControlNet, ensuring that the clothed 3D human model retains the shape of the source identity while accurately wearing the target garments. Extensive qualitative and quantitative experiments demonstrate that IPVTON outperforms previous methods in image-based 3D virtual try-on tasks, excelling in both geometry and texture.
Related papers
- DreamVTON: Customizing 3D Virtual Try-on with Personalized Diffusion Models [56.55549019625362]
Image-based 3D Virtual Try-ON (VTON) aims to sculpt the 3D human according to person and clothes images.
Recent text-to-3D methods achieve remarkable improvement in high-fidelity 3D human generation.
We propose a novel customizing 3D human try-on model, named textbfDreamVTON, to separately optimize the geometry and texture of the 3D human.
arXiv Detail & Related papers (2024-07-23T14:25:28Z) - ID-to-3D: Expressive ID-guided 3D Heads via Score Distillation Sampling [96.87575334960258]
ID-to-3D is a method to generate identity- and text-guided 3D human heads with disentangled expressions.
Results achieve an unprecedented level of identity-consistent and high-quality texture and geometry generation.
arXiv Detail & Related papers (2024-05-26T13:36:45Z) - En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D
Synthetic Data [36.51674664590734]
We present En3D, an enhanced izable scheme for high-qualityd 3D human avatars.
Unlike previous works that rely on scarce 3D datasets or limited 2D collections with imbalance viewing angles and pose priors, our approach aims to develop a zero-shot 3D capable of producing 3D humans.
arXiv Detail & Related papers (2024-01-02T12:06:31Z) - Structured 3D Features for Reconstructing Controllable Avatars [43.36074729431982]
We introduce Structured 3D Features, a model based on a novel implicit 3D representation that pools pixel-aligned image features onto dense 3D points sampled from a parametric, statistical human mesh surface.
We show that our S3F model surpasses the previous state-of-the-art on various tasks, including monocular 3D reconstruction, as well as albedo and shading estimation.
arXiv Detail & Related papers (2022-12-13T18:57:33Z) - Next3D: Generative Neural Texture Rasterization for 3D-Aware Head
Avatars [36.4402388864691]
3D-aware generative adversarial networks (GANs) synthesize high-fidelity and multi-view-consistent facial images using only collections of single-view 2D imagery.
Recent efforts incorporate 3D Morphable Face Model (3DMM) to describe deformation in generative radiance fields either explicitly or implicitly.
We propose a novel 3D GAN framework for unsupervised learning of generative, high-quality and 3D-consistent facial avatars from unstructured 2D images.
arXiv Detail & Related papers (2022-11-21T06:40:46Z) - TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting
Decomposition [39.312567993736025]
We propose TANGO, which transfers the appearance style of a given 3D shape according to a text prompt in a photorealistic manner.
We show that TANGO outperforms existing methods of text-driven 3D style transfer in terms of photorealistic quality, consistency of 3D geometry, and robustness when stylizing low-quality meshes.
arXiv Detail & Related papers (2022-10-20T13:52:18Z) - DRaCoN -- Differentiable Rasterization Conditioned Neural Radiance
Fields for Articulated Avatars [92.37436369781692]
We present DRaCoN, a framework for learning full-body volumetric avatars.
It exploits the advantages of both the 2D and 3D neural rendering techniques.
Experiments on the challenging ZJU-MoCap and Human3.6M datasets indicate that DRaCoN outperforms state-of-the-art methods.
arXiv Detail & Related papers (2022-03-29T17:59:15Z) - 3DStyleNet: Creating 3D Shapes with Geometric and Texture Style
Variations [81.45521258652734]
We propose a method to create plausible geometric and texture style variations of 3D objects.
Our method can create many novel stylized shapes, resulting in effortless 3D content creation and style-ware data augmentation.
arXiv Detail & Related papers (2021-08-30T02:28:31Z) - Learning to Transfer Texture from Clothing Images to 3D Humans [50.838970996234465]
We present a method to automatically transfer textures of clothing images to 3D garments worn on top SMPL, in real time.
We first compute training pairs of images with aligned 3D garments using a custom non-rigid 3D to 2D registration method, which is accurate but slow.
Our model opens the door to applications such as virtual try-on, and allows for generation of 3D humans with varied textures which is necessary for learning.
arXiv Detail & Related papers (2020-03-04T12:53:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.