Related papers: Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models

Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models

URL: http://arxiv.org/abs/2404.04243v3
Date: Mon, 28 Oct 2024 08:22:36 GMT
Title: Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models
Authors: Sangwon Jang, Jaehyeong Jo, Kimin Lee, Sung Ju Hwang,
Abstract summary: We present MuDI, a novel framework that enables multi-subject personalization. Our main idea is to utilize segmented subjects generated by a foundation model for segmentation. Experimental results show that our MuDI can produce high-quality personalized images without identity mixing.
Score: 66.05234562835136
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-to-image diffusion models have shown remarkable success in generating personalized subjects based on a few reference images. However, current methods often fail when generating multiple subjects simultaneously, resulting in mixed identities with combined attributes from different subjects. In this work, we present MuDI, a novel framework that enables multi-subject personalization by effectively decoupling identities from multiple subjects. Our main idea is to utilize segmented subjects generated by a foundation model for segmentation (Segment Anything) for both training and inference, as a form of data augmentation for training and initialization for the generation process. Moreover, we further introduce a new metric to better evaluate the performance of our method on multi-subject personalization. Experimental results show that our MuDI can produce high-quality personalized images without identity mixing, even for highly similar subjects as shown in Figure 1. Specifically, in human evaluation, MuDI obtains twice the success rate for personalizing multiple subjects without identity mixing over existing baselines and is preferred over 70% against the strongest baseline.

Related papers

MUSAR: Exploring Multi-Subject Customization from Single-Subject Dataset via Attention Routing [14.88610127301938]
MUSAR is a framework to achieve robust multi-subject customization while requiring only single-subject training data.<n>It constructs diptych training pairs from single-subject images to facilitate multi-subject learning, while actively correcting the distribution bias introduced by diptych construction.<n>Experiments demonstrate that our MUSAR outperforms existing methods - even those trained on multi-subject dataset.
arXiv Detail & Related papers (2025-05-05T17:50:24Z)
Single Image Iterative Subject-driven Generation and Editing [40.285860652338506]
We present SISO, a training-free approach to personalize image generation and editing from a single image without training. SISO iteratively generates images and optimize the model based on loss of similarity with the given subject image. We demonstrate significant improvements over existing methods in image quality, subject fidelity, and background preservation.
arXiv Detail & Related papers (2025-03-20T10:45:04Z)
UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing [59.590505989071175]
Text-to-Image (T2I) diffusion models have shown impressive results in generating visually compelling images following user prompts. We introduce UniVG, a generalist diffusion model capable of supporting a diverse range of image generation tasks with a single set of weights.
arXiv Detail & Related papers (2025-03-16T21:11:25Z)
JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation [49.997839600988875]
Existing personalization methods rely on finetuning a text-to-image foundation model on a user's custom dataset. We propose Joint-Image Diffusion (jedi), an effective technique for learning a finetuning-free personalization model. Our model achieves state-of-the-art generation quality, both quantitatively and qualitatively, significantly outperforming both the prior finetuning-based and finetuning-free personalization baselines.
arXiv Detail & Related papers (2024-07-08T17:59:02Z)
Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models. In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques. We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z)
IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models [31.762112403595612]
IDAdapter is a tuning-free approach that enhances the diversity and identity preservation in personalized image generation from a single face image. During the training phase, we incorporate mixed features from multiple reference images of a specific identity to enrich identity-related content details.
arXiv Detail & Related papers (2024-03-20T12:13:04Z)
Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization [56.12990759116612]
Pick-and-Draw is a training-free semantic guidance approach to boost identity consistency and generative diversity for personalization methods. The proposed approach can be applied to any personalized diffusion models and requires as few as a single reference image.
arXiv Detail & Related papers (2024-01-30T05:56:12Z)
FaceStudio: Put Your Face Everywhere in Seconds [23.381791316305332]
Identity-preserving image synthesis seeks to maintain a subject's identity while adding a personalized, stylistic touch. Traditional methods, such as Textual Inversion and DreamBooth, have made strides in custom image creation. Our research introduces a novel approach to identity-preserving synthesis, with a particular focus on human images.
arXiv Detail & Related papers (2023-12-05T11:02:45Z)
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models [19.519789922033034]
PhotoVerse is an innovative methodology that incorporates a dual-branch conditioning mechanism in both text and image domains. After a single training phase, our approach enables generating high-quality images within only a few seconds.
arXiv Detail & Related papers (2023-09-11T19:59:43Z)
Cones 2: Customizable Image Synthesis with Multiple Subjects [50.54010141032032]
We study how to efficiently represent a particular subject as well as how to appropriately compose different subjects. By rectifying the activations in the cross-attention map, the layout appoints and separates the location of different subjects in the image.
arXiv Detail & Related papers (2023-05-30T18:00:06Z)
Identity Encoder for Personalized Diffusion [57.1198884486401]
We propose an encoder-based approach for personalization. We learn an identity encoder which can extract an identity representation from a set of reference images of a subject. We show that our approach consistently outperforms existing fine-tuning based approach in both image generation and reconstruction.
arXiv Detail & Related papers (2023-04-14T23:32:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.