VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction
- URL: http://arxiv.org/abs/2503.12165v2
- Date: Fri, 11 Apr 2025 05:47:28 GMT
- Title: VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction
- Authors: Zijian He, Yuwei Ning, Yipeng Qin, Guangrun Wang, Sibei Yang, Liang Lin, Guanbin Li,
- Abstract summary: Virtual Try-On (VTON) is a transformative technology in e-commerce and fashion design, enabling realistic digital visualization of clothing on individuals.<n>We propose VTON 360, a novel 3D VTON method that addresses the open challenge of achieving high-fidelity VTON that supports any-view rendering.
- Score: 103.0918705283309
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Virtual Try-On (VTON) is a transformative technology in e-commerce and fashion design, enabling realistic digital visualization of clothing on individuals. In this work, we propose VTON 360, a novel 3D VTON method that addresses the open challenge of achieving high-fidelity VTON that supports any-view rendering. Specifically, we leverage the equivalence between a 3D model and its rendered multi-view 2D images, and reformulate 3D VTON as an extension of 2D VTON that ensures 3D consistent results across multiple views. To achieve this, we extend 2D VTON models to include multi-view garments and clothing-agnostic human body images as input, and propose several novel techniques to enhance them, including: i) a pseudo-3D pose representation using normal maps derived from the SMPL-X 3D human model, ii) a multi-view spatial attention mechanism that models the correlations between features from different viewing angles, and iii) a multi-view CLIP embedding that enhances the garment CLIP features used in 2D VTON with camera information. Extensive experiments on large-scale real datasets and clothing images from e-commerce platforms demonstrate the effectiveness of our approach. Project page: https://scnuhealthy.github.io/VTON360.
Related papers
- IPVTON: Image-based 3D Virtual Try-on with Image Prompt Adapter [64.03091978606952]
Given a pair of images depicting a person and a garment separately, image-based 3D virtual try-on methods aim to reconstruct a 3D human model.<n>We present IPVTON, a novel image-based 3D virtual try-on framework.
arXiv Detail & Related papers (2025-01-26T17:51:03Z) - Sharp-It: A Multi-view to Multi-view Diffusion Model for 3D Synthesis and Manipulation [15.215597253086612]
We bridge the quality gap between methods that directly generate 3D representations and ones that reconstruct 3D objects from multi-view images.<n>We introduce a multi-view to multi-view diffusion model called Sharp-It, which takes a 3D consistent set of multi-view images.<n>We demonstrate that Sharp-It enables various 3D applications, such as fast synthesis, editing, and controlled generation, while attaining high-quality assets.
arXiv Detail & Related papers (2024-12-03T17:58:07Z) - Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models [112.2625368640425]
High-resolution Image-to-3D model (Hi3D) is a new video diffusion based paradigm that redefines a single image to multi-view images as 3D-aware sequential image generation.
Hi3D first empowers the pre-trained video diffusion model with 3D-aware prior, yielding multi-view images with low-resolution texture details.
arXiv Detail & Related papers (2024-09-11T17:58:57Z) - DreamVTON: Customizing 3D Virtual Try-on with Personalized Diffusion Models [56.55549019625362]
Image-based 3D Virtual Try-ON (VTON) aims to sculpt the 3D human according to person and clothes images.
Recent text-to-3D methods achieve remarkable improvement in high-fidelity 3D human generation.
We propose a novel customizing 3D human try-on model, named textbfDreamVTON, to separately optimize the geometry and texture of the 3D human.
arXiv Detail & Related papers (2024-07-23T14:25:28Z) - Structured 3D Features for Reconstructing Controllable Avatars [43.36074729431982]
We introduce Structured 3D Features, a model based on a novel implicit 3D representation that pools pixel-aligned image features onto dense 3D points sampled from a parametric, statistical human mesh surface.
We show that our S3F model surpasses the previous state-of-the-art on various tasks, including monocular 3D reconstruction, as well as albedo and shading estimation.
arXiv Detail & Related papers (2022-12-13T18:57:33Z) - M3D-VTON: A Monocular-to-3D Virtual Try-On Network [62.77413639627565]
Existing 3D virtual try-on methods mainly rely on annotated 3D human shapes and garment templates.
We propose a novel Monocular-to-3D Virtual Try-On Network (M3D-VTON) that builds on the merits of both 2D and 3D approaches.
arXiv Detail & Related papers (2021-08-11T10:05:17Z) - Towards Realistic 3D Embedding via View Alignment [53.89445873577063]
This paper presents an innovative View Alignment GAN (VA-GAN) that composes new images by embedding 3D models into 2D background images realistically and automatically.
VA-GAN consists of a texture generator and a differential discriminator that are inter-connected and end-to-end trainable.
arXiv Detail & Related papers (2020-07-14T14:45:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.