From Wardrobe to Canvas: Wardrobe Polyptych LoRA for Part-level Controllable Human Image Generation
- URL: http://arxiv.org/abs/2507.10217v2
- Date: Mon, 21 Jul 2025 02:49:59 GMT
- Title: From Wardrobe to Canvas: Wardrobe Polyptych LoRA for Part-level Controllable Human Image Generation
- Authors: Jeongho Kim, Sunghyun Park, Hyoungwoo Park, Sungrack Yun, Jaegul Choo, Seokeon Choi,
- Abstract summary: We present Wardrobe Polyptych LoRA, a controllable part-level controllable model for personalized human image generation.<n>By training only LoRA layers, our method removes the computational burden at inference while ensuring high-fidelity synthesis of unseen subjects.<n>Our approach significantly outperforms existing techniques in fidelity and consistency, enabling realistic and identity-preserving full-body synthesis.
- Score: 44.46447676191666
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent diffusion models achieve personalization by learning specific subjects, allowing learned attributes to be integrated into generated images. However, personalized human image generation remains challenging due to the need for precise and consistent attribute preservation (e.g., identity, clothing details). Existing subject-driven image generation methods often require either (1) inference-time fine-tuning with few images for each new subject or (2) large-scale dataset training for generalization. Both approaches are computationally expensive and impractical for real-time applications. To address these limitations, we present Wardrobe Polyptych LoRA, a novel part-level controllable model for personalized human image generation. By training only LoRA layers, our method removes the computational burden at inference while ensuring high-fidelity synthesis of unseen subjects. Our key idea is to condition the generation on the subject's wardrobe and leverage spatial references to reduce information loss, thereby improving fidelity and consistency. Additionally, we introduce a selective subject region loss, which encourages the model to disregard some of reference images during training. Our loss ensures that generated images better align with text prompts while maintaining subject integrity. Notably, our Wardrobe Polyptych LoRA requires no additional parameters at the inference stage and performs generation using a single model trained on a few training samples. We construct a new dataset and benchmark tailored for personalized human image generation. Extensive experiments show that our approach significantly outperforms existing techniques in fidelity and consistency, enabling realistic and identity-preserving full-body synthesis.
Related papers
- Single Image Iterative Subject-driven Generation and Editing [40.285860652338506]
We present SISO, a training-free approach to personalize image generation and editing from a single image without training.<n>SISO iteratively generates images and optimize the model based on loss of similarity with the given subject image.<n>We demonstrate significant improvements over existing methods in image quality, subject fidelity, and background preservation.
arXiv Detail & Related papers (2025-03-20T10:45:04Z) - Controllable Human Image Generation with Personalized Multi-Garments [46.042383679103125]
BootComp is a novel framework based on text-to-image diffusion models for controllable human image generation with multiple reference garments.<n>We propose a data generation pipeline to construct a large synthetic dataset, consisting of human and multiple-garment pairs.<n>We show the wide-applicability of our framework by adapting it to different types of reference-based generation in the fashion domain.
arXiv Detail & Related papers (2024-11-25T12:37:13Z) - Towards Reliable Verification of Unauthorized Data Usage in Personalized Text-to-Image Diffusion Models [23.09033991200197]
New personalization techniques have been proposed to customize the pre-trained base models for crafting images with specific themes or styles.
Such a lightweight solution poses a new concern regarding whether the personalized models are trained from unauthorized data.
We introduce SIREN, a novel methodology to proactively trace unauthorized data usage in black-box personalized text-to-image diffusion models.
arXiv Detail & Related papers (2024-10-14T12:29:23Z) - JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation [49.997839600988875]
Existing personalization methods rely on finetuning a text-to-image foundation model on a user's custom dataset.
We propose Joint-Image Diffusion (jedi), an effective technique for learning a finetuning-free personalization model.
Our model achieves state-of-the-art generation quality, both quantitatively and qualitatively, significantly outperforming both the prior finetuning-based and finetuning-free personalization baselines.
arXiv Detail & Related papers (2024-07-08T17:59:02Z) - Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability.
We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images.
Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z) - Active Generation for Image Classification [45.93535669217115]
We propose to address the efficiency of image generation by focusing on the specific needs and characteristics of the model.
With a central tenet of active learning, our method, named ActGen, takes a training-aware approach to image generation.
arXiv Detail & Related papers (2024-03-11T08:45:31Z) - Taming Encoder for Zero Fine-tuning Image Customization with
Text-to-Image Diffusion Models [55.04969603431266]
This paper proposes a method for generating images of customized objects specified by users.
The method is based on a general framework that bypasses the lengthy optimization required by previous approaches.
We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity.
arXiv Detail & Related papers (2023-04-05T17:59:32Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z) - More Photos are All You Need: Semi-Supervised Learning for Fine-Grained
Sketch Based Image Retrieval [112.1756171062067]
We introduce a novel semi-supervised framework for cross-modal retrieval.
At the centre of our design is a sequential photo-to-sketch generation model.
We also introduce a discriminator guided mechanism to guide against unfaithful generation.
arXiv Detail & Related papers (2021-03-25T17:27:08Z) - Learning a Deep Reinforcement Learning Policy Over the Latent Space of a
Pre-trained GAN for Semantic Age Manipulation [4.306143768014157]
We learn a conditional policy for semantic manipulation along specific attributes under defined identity bounds.
Results show that our learned policy samples high fidelity images with required age alterations.
arXiv Detail & Related papers (2020-11-02T13:15:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.