AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models
- URL: http://arxiv.org/abs/2412.04146v2
- Date: Mon, 06 Jan 2025 06:56:03 GMT
- Title: AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models
- Authors: Xinghui Li, Qichao Sun, Pengze Zhang, Fulong Ye, Zhichao Liao, Wanquan Feng, Songtao Zhao, Qian He,
- Abstract summary: We propose a novel AnyDressing method for customizing characters conditioned on any combination of garments and personalized text prompts.
AnyDressing comprises two primary networks named GarmentsNet and DressingNet, which are respectively dedicated to extracting detailed clothing features.
We introduce a Garment-Enhanced Texture Learning strategy to improve the fine-grained texture details of garments.
- Score: 7.534556848810697
- License:
- Abstract: Recent advances in garment-centric image generation from text and image prompts based on diffusion models are impressive. However, existing methods lack support for various combinations of attire, and struggle to preserve the garment details while maintaining faithfulness to the text prompts, limiting their performance across diverse scenarios. In this paper, we focus on a new task, i.e., Multi-Garment Virtual Dressing, and we propose a novel AnyDressing method for customizing characters conditioned on any combination of garments and any personalized text prompts. AnyDressing comprises two primary networks named GarmentsNet and DressingNet, which are respectively dedicated to extracting detailed clothing features and generating customized images. Specifically, we propose an efficient and scalable module called Garment-Specific Feature Extractor in GarmentsNet to individually encode garment textures in parallel. This design prevents garment confusion while ensuring network efficiency. Meanwhile, we design an adaptive Dressing-Attention mechanism and a novel Instance-Level Garment Localization Learning strategy in DressingNet to accurately inject multi-garment features into their corresponding regions. This approach efficiently integrates multi-garment texture cues into generated images and further enhances text-image consistency. Additionally, we introduce a Garment-Enhanced Texture Learning strategy to improve the fine-grained texture details of garments. Thanks to our well-craft design, AnyDressing can serve as a plug-in module to easily integrate with any community control extensions for diffusion models, improving the diversity and controllability of synthesized images. Extensive experiments show that AnyDressing achieves state-of-the-art results.
Related papers
- Multimodal Latent Diffusion Model for Complex Sewing Pattern Generation [52.13927859375693]
We propose SewingLDM, a multi-modal generative model that generates sewing patterns controlled by text prompts, body shapes, and garment sketches.
To learn the sewing pattern distribution in the latent space, we design a two-step training strategy.
Comprehensive qualitative and quantitative experiments show the effectiveness of our proposed method.
arXiv Detail & Related papers (2024-12-19T02:05:28Z) - AIpparel: A Large Multimodal Generative Model for Digital Garments [71.12933771326279]
We introduce AIpparel, a large multimodal model for generating and editing sewing patterns.
Our model fine-tunes state-of-the-art large multimodal models on a custom-curated large-scale dataset of over 120,000 unique garments.
We propose a novel tokenization scheme that concisely encodes these complex sewing patterns so that LLMs can learn to predict them efficiently.
arXiv Detail & Related papers (2024-12-05T07:35:19Z) - FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on [73.13242624924814]
Garment perception enhancement technique, FitDiT, is designed for high-fidelity virtual try-on using Diffusion Transformers (DiT)
We introduce a garment texture extractor that incorporates garment priors evolution to fine-tune garment feature, facilitating to better capture rich details such as stripes, patterns, and text.
We also employ a dilated-relaxed mask strategy that adapts to the correct length of garments, preventing the generation of garments that fill the entire mask area during cross-category try-on.
arXiv Detail & Related papers (2024-11-15T11:02:23Z) - Gaussian Garments: Reconstructing Simulation-Ready Clothing with Photorealistic Appearance from Multi-View Video [66.98046635045685]
We introduce a novel approach for reconstructing realistic simulation-ready garment assets from multi-view videos.
Our method represents garments with a combination of a 3D mesh and a Gaussian texture that encodes both the color and high-frequency surface details.
This representation enables accurate registration of garment geometries to multi-view videos and helps disentangle albedo textures from lighting effects.
arXiv Detail & Related papers (2024-09-12T16:26:47Z) - Multi-Garment Customized Model Generation [3.1679243514285194]
Multi-Garment Customized Model Generation is a unified framework based on Latent Diffusion Models (LDMs)
Our framework supports the conditional generation of multiple garments through decoupled multi-garment feature fusion.
The proposed garment encoder is a plug-and-play module that can be combined with other extension modules.
arXiv Detail & Related papers (2024-08-09T17:57:33Z) - IMAGDressing-v1: Customizable Virtual Dressing [58.44155202253754]
IMAGDressing-v1 is a virtual dressing task that generates freely editable human images with fixed garments and optional conditions.
IMAGDressing-v1 incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE.
We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet.
arXiv Detail & Related papers (2024-07-17T16:26:30Z) - AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario [50.62711489896909]
AnyFit surpasses all baselines on high-resolution benchmarks and real-world data by a large gap.
AnyFit's impressive performance on high-fidelity virtual try-ons in any scenario from any image, paves a new path for future research within the fashion community.
arXiv Detail & Related papers (2024-05-28T13:33:08Z) - Magic Clothing: Controllable Garment-Driven Image Synthesis [7.46772222515689]
We propose Magic Clothing, a latent diffusion model (LDM)-based network architecture for an unexplored garment-driven image synthesis task.
Aiming at generating customized characters wearing the target garments with diverse text prompts, the image controllability is the most critical issue.
We introduce a garment extractor to capture the detailed garment features, and employ self-attention fusion to incorporate them into the pretrained LDMs.
arXiv Detail & Related papers (2024-04-15T07:15:39Z) - Per Garment Capture and Synthesis for Real-time Virtual Try-on [15.128477359632262]
Existing image-based works try to synthesize a try-on image from a single image of a target garment.
It is difficult to reproduce the change of wrinkles caused by pose and body size change, as well as pulling and stretching of the garment by hand.
We propose an alternative per garment capture and synthesis workflow to handle such rich interactions by training the model with many systematically captured images.
arXiv Detail & Related papers (2021-09-10T03:49:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.