Related papers: IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models

IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models

URL: http://arxiv.org/abs/2403.13535v2
Date: Thu, 21 Mar 2024 02:31:58 GMT
Title: IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models
Authors: Siying Cui, Jia Guo, Xiang An, Jiankang Deng, Yongle Zhao, Xinyu Wei, Ziyong Feng,
Abstract summary: IDAdapter is a tuning-free approach that enhances the diversity and identity preservation in personalized image generation from a single face image. During the training phase, we incorporate mixed features from multiple reference images of a specific identity to enrich identity-related content details.
Score: 31.762112403595612
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Leveraging Stable Diffusion for the generation of personalized portraits has emerged as a powerful and noteworthy tool, enabling users to create high-fidelity, custom character avatars based on their specific prompts. However, existing personalization methods face challenges, including test-time fine-tuning, the requirement of multiple input images, low preservation of identity, and limited diversity in generated outcomes. To overcome these challenges, we introduce IDAdapter, a tuning-free approach that enhances the diversity and identity preservation in personalized image generation from a single face image. IDAdapter integrates a personalized concept into the generation process through a combination of textual and visual injections and a face identity loss. During the training phase, we incorporate mixed features from multiple reference images of a specific identity to enrich identity-related content details, guiding the model to generate images with more diverse styles, expressions, and angles compared to previous works. Extensive evaluations demonstrate the effectiveness of our method, achieving both diversity and identity fidelity in generated images.

Related papers

ID-EA: Identity-driven Text Enhancement and Adaptation with Textual Inversion for Personalized Text-to-Image Generation [33.84646269805187]
ID-EA is a novel framework that guides text embeddings to align with visual identity embeddings.<n> ID-EA substantially outperforms state-of-the-art methods in identity preservation metrics.<n>It generates personalized portraits 15 times faster than existing approaches.
arXiv Detail & Related papers (2025-07-16T07:42:02Z)
PIDiff: Image Customization for Personalized Identities with Diffusion Models [13.726194815227464]
We propose a novel fine-tuning-based diffusion model for personalized identities text-to-image generation, named PIDiff.<n>PIDiff avoids semantic entanglement and achieves accurate feature extraction and localization.
arXiv Detail & Related papers (2025-05-08T09:26:28Z)
DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability [12.692129257068085]
DynamicID is a tuning-free framework supported by a dual-stage training paradigm. We have developed a curated VariFace-10k facial dataset, comprising 10k unique individuals, each represented by 35 distinct facial images.
arXiv Detail & Related papers (2025-03-09T08:16:19Z)
Omni-ID: Holistic Identity Representation Designed for Generative Tasks [75.29174595706533]
Omni-ID encodes holistic information about an individual's appearance across diverse expressions. It consolidates information from a varied number of unstructured input images into a structured representation. It demonstrates substantial improvements over conventional representations across various generative tasks.
arXiv Detail & Related papers (2024-12-12T19:21:20Z)
Foundation Cures Personalization: Recovering Facial Personalized Models' Prompt Consistency [33.35678923549471]
FreeCure is a training-free framework that harnesses the intrinsic knowledge from the foundation models themselves to improve the prompt consistency of personalization models. We enhance multiple attributes in the outputs of personalization models through a novel noise-blending strategy and an inversion-based process.
arXiv Detail & Related papers (2024-11-22T15:21:38Z)
Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models [66.05234562835136]
We present MuDI, a novel framework that enables multi-subject personalization. Our main idea is to utilize segmented subjects generated by a foundation model for segmentation. Experimental results show that our MuDI can produce high-quality personalized images without identity mixing.
arXiv Detail & Related papers (2024-04-05T17:45:22Z)
Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm [31.06269858216316]
We propose Infinite-ID, an ID-semantics decoupling paradigm for identity-preserved personalization. We introduce an identity-enhanced training, incorporating an additional image cross-attention module to capture sufficient ID information. We also introduce a feature interaction mechanism that combines a mixed attention module with an AdaIN-mean operation to seamlessly merge the two streams.
arXiv Detail & Related papers (2024-03-18T13:39:53Z)
Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization [56.12990759116612]
Pick-and-Draw is a training-free semantic guidance approach to boost identity consistency and generative diversity for personalization methods. The proposed approach can be applied to any personalized diffusion models and requires as few as a single reference image.
arXiv Detail & Related papers (2024-01-30T05:56:12Z)
StableIdentity: Inserting Anybody into Anywhere at First Sight [57.99693188913382]
We propose StableIdentity, which allows identity-consistent recontextualization with just one face image. We are the first to directly inject the identity learned from a single image into video/3D generation without finetuning.
arXiv Detail & Related papers (2024-01-29T09:06:15Z)
PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization [92.90392834835751]
PortraitBooth is designed for high efficiency, robust identity preservation, and expression-editable text-to-image generation. PortraitBooth eliminates computational overhead and mitigates identity distortion. It incorporates emotion-aware cross-attention control for diverse facial expressions in generated images.
arXiv Detail & Related papers (2023-12-11T13:03:29Z)
When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images. We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models. Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z)
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models [19.519789922033034]
PhotoVerse is an innovative methodology that incorporates a dual-branch conditioning mechanism in both text and image domains. After a single training phase, our approach enables generating high-quality images within only a few seconds.
arXiv Detail & Related papers (2023-09-11T19:59:43Z)
Identity Encoder for Personalized Diffusion [57.1198884486401]
We propose an encoder-based approach for personalization. We learn an identity encoder which can extract an identity representation from a set of reference images of a subject. We show that our approach consistently outperforms existing fine-tuning based approach in both image generation and reconstruction.
arXiv Detail & Related papers (2023-04-14T23:32:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.