Inv-Adapter: ID Customization Generation via Image Inversion and Lightweight Adapter
- URL: http://arxiv.org/abs/2406.02881v2
- Date: Thu, 6 Jun 2024 06:59:46 GMT
- Title: Inv-Adapter: ID Customization Generation via Image Inversion and Lightweight Adapter
- Authors: Peng Xing, Ning Wang, Jianbo Ouyang, Zechao Li,
- Abstract summary: We propose a lightweight Inv-Adapter, which first extracts diffusion-domain representations of ID images utilizing a pre-trained text-to-image model via DDIM image inversion.
Benefiting from the high alignment of the extracted ID prompt features and the intermediate features of the text-to-image model, we then embed them efficiently into the base text-to-image model.
We conduct extensive experiments to assess ID fidelity, generation loyalty, speed, and training parameters, all of which show that the proposed Inv-Adapter is highly competitive in ID customization generation and model scale.
- Score: 23.690420512911146
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The remarkable advancement in text-to-image generation models significantly boosts the research in ID customization generation. However, existing personalization methods cannot simultaneously satisfy high fidelity and high-efficiency requirements. Their main bottleneck lies in the prompt image encoder, which produces weak alignment signals with the text-to-image model and significantly increased model size. Towards this end, we propose a lightweight Inv-Adapter, which first extracts diffusion-domain representations of ID images utilizing a pre-trained text-to-image model via DDIM image inversion, without additional image encoder. Benefiting from the high alignment of the extracted ID prompt features and the intermediate features of the text-to-image model, we then embed them efficiently into the base text-to-image model by carefully designing a lightweight attention adapter. We conduct extensive experiments to assess ID fidelity, generation loyalty, speed, and training parameters, all of which show that the proposed Inv-Adapter is highly competitive in ID customization generation and model scale.
Related papers
- ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning [57.91881829308395]
Identity-preserving text-to-image generation (ID-T2I) has received significant attention due to its wide range of application scenarios like AI portrait and advertising.
We present textbfID-Aligner, a general feedback learning framework to enhance ID-T2I performance.
arXiv Detail & Related papers (2024-04-23T18:41:56Z) - Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm [31.06269858216316]
We propose Infinite-ID, an ID-semantics decoupling paradigm for identity-preserved personalization.
We introduce an identity-enhanced training, incorporating an additional image cross-attention module to capture sufficient ID information.
We also introduce a feature interaction mechanism that combines a mixed attention module with an AdaIN-mean operation to seamlessly merge the two streams.
arXiv Detail & Related papers (2024-03-18T13:39:53Z) - InstantID: Zero-shot Identity-Preserving Generation in Seconds [21.04236321562671]
We introduce InstantID, a powerful diffusion model-based solution for ID embedding.
Our plug-and-play module adeptly handles image personalization in various styles using just a single facial image.
Our work seamlessly integrates with popular pre-trained text-to-image diffusion models like SD1.5 and SDXL.
arXiv Detail & Related papers (2024-01-15T07:50:18Z) - I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models [80.32562822058924]
Text-guided image-to-video (I2V) generation aims to generate a coherent video that preserves the identity of the input image.
I2V-Adapter adeptly propagates the unnoised input image to subsequent noised frames through a cross-frame attention mechanism.
Our experimental results demonstrate that I2V-Adapter is capable of producing high-quality videos.
arXiv Detail & Related papers (2023-12-27T19:11:50Z) - PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding [102.07914175196817]
PhotoMaker is an efficient personalized text-to-image generation method.
It encodes an arbitrary number of input ID images into a stack ID embedding for preserving ID information.
arXiv Detail & Related papers (2023-12-07T17:32:29Z) - IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image
Diffusion Models [11.105763635691641]
An alternative to text prompt is image prompt, as the saying goes: "an image is worth a thousand words"
We present IP-Adapter, an effective and lightweight adapter to achieve image prompt capability for the pretrained text-to-image diffusion models.
arXiv Detail & Related papers (2023-08-13T08:34:51Z) - SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with
Large Language Models [56.88192537044364]
We propose a simple-yet-effective parameter-efficient fine-tuning approach called the Semantic Understanding and Reasoning adapter (SUR-adapter) for pre-trained diffusion models.
Our approach can make text-to-image diffusion models easier to use with better user experience.
arXiv Detail & Related papers (2023-05-09T05:48:38Z) - LAFITE: Towards Language-Free Training for Text-to-Image Generation [83.2935513540494]
We propose the first work to train text-to-image generation models without any text data.
Our method leverages the well-aligned multi-modal semantic space of the powerful pre-trained CLIP model.
We obtain state-of-the-art results in the standard text-to-image generation tasks.
arXiv Detail & Related papers (2021-11-27T01:54:45Z) - Fine-grained Image-to-Image Transformation towards Visual Recognition [102.51124181873101]
We aim at transforming an image with a fine-grained category to synthesize new images that preserve the identity of the input image.
We adopt a model based on generative adversarial networks to disentangle the identity related and unrelated factors of an image.
Experiments on the CompCars and Multi-PIE datasets demonstrate that our model preserves the identity of the generated images much better than the state-of-the-art image-to-image transformation models.
arXiv Detail & Related papers (2020-01-12T05:26:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.