StableGarment: Garment-Centric Generation via Stable Diffusion
- URL: http://arxiv.org/abs/2403.10783v1
- Date: Sat, 16 Mar 2024 03:05:07 GMT
- Title: StableGarment: Garment-Centric Generation via Stable Diffusion
- Authors: Rui Wang, Hailong Guo, Jiaming Liu, Huaxia Li, Haibo Zhao, Xu Tang, Yao Hu, Hao Tang, Peipei Li,
- Abstract summary: We introduce StableGarment, a unified framework to tackle garment-centric(GC) generation tasks.
Our solution involves the development of a garment encoder, a trainable copy of the denoising UNet equipped with additive self-attention layers.
The incorporation of a dedicated try-on ControlNet enables StableGarment to execute virtual try-on tasks with precision.
- Score: 29.5112874761836
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce StableGarment, a unified framework to tackle garment-centric(GC) generation tasks, including GC text-to-image, controllable GC text-to-image, stylized GC text-to-image, and robust virtual try-on. The main challenge lies in retaining the intricate textures of the garment while maintaining the flexibility of pre-trained Stable Diffusion. Our solution involves the development of a garment encoder, a trainable copy of the denoising UNet equipped with additive self-attention (ASA) layers. These ASA layers are specifically devised to transfer detailed garment textures, also facilitating the integration of stylized base models for the creation of stylized images. Furthermore, the incorporation of a dedicated try-on ControlNet enables StableGarment to execute virtual try-on tasks with precision. We also build a novel data engine that produces high-quality synthesized data to preserve the model's ability to follow prompts. Extensive experiments demonstrate that our approach delivers state-of-the-art (SOTA) results among existing virtual try-on methods and exhibits high flexibility with broad potential applications in various garment-centric image generation.
Related papers
- TryOffAnyone: Tiled Cloth Generation from a Dressed Person [1.4732811715354452]
High-fidelity tiled garment images are essential for personalized recommendations, outfit composition, and virtual try-on systems.
We propose a novel approach utilizing a fine-tuned StableDiffusion model.
Our method features a streamlined single-stage network design, which integrates garmentspecific masks to isolate and process target clothing items effectively.
arXiv Detail & Related papers (2024-12-11T17:41:53Z) - AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models [7.534556848810697]
We propose a novel AnyDressing method for customizing characters conditioned on any combination of garments and personalized text prompts.
AnyDressing comprises two primary networks named GarmentsNet and DressingNet, which are respectively dedicated to extracting detailed clothing features.
We introduce a Garment-Enhanced Texture Learning strategy to improve the fine-grained texture details of garments.
arXiv Detail & Related papers (2024-12-05T13:16:47Z) - Improving Virtual Try-On with Garment-focused Diffusion Models [91.95830983115474]
Diffusion models have led to the revolutionizing of generative modeling in numerous image synthesis tasks.
We shape a new Diffusion model, namely GarDiff, which triggers the garment-focused diffusion process.
Experiments on VITON-HD and DressCode datasets demonstrate the superiority of our GarDiff when compared to state-of-the-art VTON approaches.
arXiv Detail & Related papers (2024-09-12T17:55:11Z) - ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - IMAGDressing-v1: Customizable Virtual Dressing [58.44155202253754]
IMAGDressing-v1 is a virtual dressing task that generates freely editable human images with fixed garments and optional conditions.
IMAGDressing-v1 incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE.
We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet.
arXiv Detail & Related papers (2024-07-17T16:26:30Z) - BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed
Dual-Branch Diffusion [61.90969199199739]
BrushNet is a novel plug-and-play dual-branch model engineered to embed pixel-level masked image features into any pre-trained DM.
BrushNet's superior performance over existing models across seven key metrics, including image quality, mask region preservation, and textual coherence.
arXiv Detail & Related papers (2024-03-11T17:59:31Z) - OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable
Virtual Try-on [7.46772222515689]
OOTDiffusion is a novel network architecture for realistic and controllable image-based virtual try-on.
We leverage the power of pretrained latent diffusion models, designing an outfitting UNet to learn the garment detail features.
Our experiments on the VITON-HD and Dress Code datasets demonstrate that OOTDiffusion efficiently generates high-quality try-on results.
arXiv Detail & Related papers (2024-03-04T07:17:44Z) - LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On [35.4056826207203]
This work introduces LaDI-VTON, the first Latent Diffusion textual Inversion-enhanced model for the Virtual Try-ON task.
The proposed architecture relies on a latent diffusion model extended with a novel additional autoencoder module.
We show that our approach outperforms the competitors by a consistent margin, achieving a significant milestone for the task.
arXiv Detail & Related papers (2023-05-22T21:38:06Z) - GLIGEN: Open-Set Grounded Text-to-Image Generation [97.72536364118024]
Grounded-Language-to-Image Generation is a novel approach that builds upon and extends the functionality of existing text-to-image diffusion models.
Our model achieves open-world grounded text2img generation with caption and bounding box condition inputs.
GLIGEN's zero-shot performance on COCO and LVIS outperforms that of existing supervised layout-to-image baselines by a large margin.
arXiv Detail & Related papers (2023-01-17T18:58:58Z) - PASTA-GAN++: A Versatile Framework for High-Resolution Unpaired Virtual
Try-on [70.12285433529998]
PASTA-GAN++ is a versatile system for high-resolution unpaired virtual try-on.
It supports unsupervised training, arbitrary garment categories, and controllable garment editing.
arXiv Detail & Related papers (2022-07-27T11:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.