Controllable Human Image Generation with Personalized Multi-Garments
- URL: http://arxiv.org/abs/2411.16801v1
- Date: Mon, 25 Nov 2024 12:37:13 GMT
- Title: Controllable Human Image Generation with Personalized Multi-Garments
- Authors: Yisol Choi, Sangkyung Kwak, Sihyun Yu, Hyungwon Choi, Jinwoo Shin,
- Abstract summary: BootComp is a novel framework based on text-to-image diffusion models for controllable human image generation with multiple reference garments.
We propose a data generation pipeline to construct a large synthetic dataset, consisting of human and multiple-garment pairs.
We show the wide-applicability of our framework by adapting it to different types of reference-based generation in the fashion domain.
- Score: 46.042383679103125
- License:
- Abstract: We present BootComp, a novel framework based on text-to-image diffusion models for controllable human image generation with multiple reference garments. Here, the main bottleneck is data acquisition for training: collecting a large-scale dataset of high-quality reference garment images per human subject is quite challenging, i.e., ideally, one needs to manually gather every single garment photograph worn by each human. To address this, we propose a data generation pipeline to construct a large synthetic dataset, consisting of human and multiple-garment pairs, by introducing a model to extract any reference garment images from each human image. To ensure data quality, we also propose a filtering strategy to remove undesirable generated data based on measuring perceptual similarities between the garment presented in human image and extracted garment. Finally, by utilizing the constructed synthetic dataset, we train a diffusion model having two parallel denoising paths that use multiple garment images as conditions to generate human images while preserving their fine-grained details. We further show the wide-applicability of our framework by adapting it to different types of reference-based generation in the fashion domain, including virtual try-on, and controllable human image generation with other conditions, e.g., pose, face, etc.
Related papers
- Controlling Human Shape and Pose in Text-to-Image Diffusion Models via Domain Adaptation [1.3654846342364308]
We present a methodology for conditional control of human shape and pose in pretrained text-to-image diffusion models.
Fine-tuning these diffusion models to adhere to new conditions requires large datasets and high-quality annotations.
We propose a domain-adaptation technique that maintains image quality by isolating synthetically trained conditional information.
arXiv Detail & Related papers (2024-11-07T14:02:41Z) - From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation [19.096741614175524]
Parts2Whole is a novel framework designed for generating customized portraits from multiple reference images.
We first develop a semantic-aware appearance encoder to retain details of different human parts.
Second, our framework supports multi-image conditioned generation through a shared self-attention mechanism.
arXiv Detail & Related papers (2024-04-23T17:56:08Z) - 3D Human Reconstruction in the Wild with Synthetic Data Using Generative Models [52.96248836582542]
We propose an effective approach based on recent diffusion models, termed HumanWild, which can effortlessly generate human images and corresponding 3D mesh annotations.
By exclusively employing generative models, we generate large-scale in-the-wild human images and high-quality annotations, eliminating the need for real-world data collection.
arXiv Detail & Related papers (2024-03-17T06:31:16Z) - UniHuman: A Unified Model for Editing Human Images in the Wild [49.896715833075106]
We propose UniHuman, a unified model that addresses multiple facets of human image editing in real-world settings.
To enhance the model's generation quality and generalization capacity, we leverage guidance from human visual encoders.
In user studies, UniHuman is preferred by the users in an average of 77% of cases.
arXiv Detail & Related papers (2023-12-22T05:00:30Z) - Exploring the Robustness of Human Parsers Towards Common Corruptions [99.89886010550836]
We construct three corruption robustness benchmarks, termed LIP-C, ATR-C, and Pascal-Person-Part-C, to assist us in evaluating the risk tolerance of human parsing models.
Inspired by the data augmentation strategy, we propose a novel heterogeneous augmentation-enhanced mechanism to bolster robustness under commonly corrupted conditions.
arXiv Detail & Related papers (2023-09-02T13:32:14Z) - HumanGAN: A Generative Model of Humans Images [78.6284090004218]
We present a generative model for images of dressed humans offering control over pose, local body part appearance and garment style.
Our model encodes part-based latent appearance vectors in a normalized pose-independent space and warps them to different poses, it preserves body and clothing appearance under varying posture.
arXiv Detail & Related papers (2021-03-11T19:00:38Z) - Pose-Guided High-Resolution Appearance Transfer via Progressive Training [65.92031716146865]
We propose a pose-guided appearance transfer network for transferring a given reference appearance to a target pose in unprecedented image resolution.
Our network utilizes dense local descriptors including local perceptual loss and local discriminators to refine details.
Our model produces high-quality images, which can be further utilized in useful applications such as garment transfer between people.
arXiv Detail & Related papers (2020-08-27T03:18:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.