Identity Encoder for Personalized Diffusion
- URL: http://arxiv.org/abs/2304.07429v1
- Date: Fri, 14 Apr 2023 23:32:24 GMT
- Title: Identity Encoder for Personalized Diffusion
- Authors: Yu-Chuan Su, Kelvin C.K. Chan, Yandong Li, Yang Zhao, Han Zhang,
Boqing Gong, Huisheng Wang, Xuhui Jia
- Abstract summary: We propose an encoder-based approach for personalization.
We learn an identity encoder which can extract an identity representation from a set of reference images of a subject.
We show that our approach consistently outperforms existing fine-tuning based approach in both image generation and reconstruction.
- Score: 57.1198884486401
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many applications can benefit from personalized image generation models,
including image enhancement, video conferences, just to name a few. Existing
works achieved personalization by fine-tuning one model for each person. While
being successful, this approach incurs additional computation and storage
overhead for each new identity. Furthermore, it usually expects tens or
hundreds of examples per identity to achieve the best performance. To overcome
these challenges, we propose an encoder-based approach for personalization. We
learn an identity encoder which can extract an identity representation from a
set of reference images of a subject, together with a diffusion generator that
can generate new images of the subject conditioned on the identity
representation. Once being trained, the model can be used to generate images of
arbitrary identities given a few examples even if the model hasn't been trained
on the identity. Our approach greatly reduces the overhead for personalized
image generation and is more applicable in many potential applications.
Empirical results show that our approach consistently outperforms existing
fine-tuning based approach in both image generation and reconstruction, and the
outputs is preferred by users more than 95% of the time compared with the best
performing baseline.
Related papers
- JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation [49.997839600988875]
Existing personalization methods rely on finetuning a text-to-image foundation model on a user's custom dataset.
We propose Joint-Image Diffusion (jedi), an effective technique for learning a finetuning-free personalization model.
Our model achieves state-of-the-art generation quality, both quantitatively and qualitatively, significantly outperforming both the prior finetuning-based and finetuning-free personalization baselines.
arXiv Detail & Related papers (2024-07-08T17:59:02Z) - Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training [51.87027943520492]
We present a novel paradigm Diffusion-ReID to efficiently augment and generate diverse images based on known identities.
Benefiting from our proposed paradigm, we first create a new large-scale person Re-ID dataset Diff-Person, which consists of over 777K images from 5,183 identities.
arXiv Detail & Related papers (2024-06-10T06:26:03Z) - Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models [66.05234562835136]
We present MuDI, a novel framework that enables multi-subject personalization.
Our main idea is to utilize segmented subjects generated by a foundation model for segmentation.
Experimental results show that our MuDI can produce high-quality personalized images without identity mixing.
arXiv Detail & Related papers (2024-04-05T17:45:22Z) - LCM-Lookahead for Encoder-based Text-to-Image Personalization [82.56471486184252]
We explore the potential of using shortcut-mechanisms to guide the personalization of text-to-image models.
We focus on encoder-based personalization approaches, and demonstrate that by tuning them with a lookahead identity loss, we can achieve higher identity fidelity.
arXiv Detail & Related papers (2024-04-04T17:43:06Z) - IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models [31.762112403595612]
IDAdapter is a tuning-free approach that enhances the diversity and identity preservation in personalized image generation from a single face image.
During the training phase, we incorporate mixed features from multiple reference images of a specific identity to enrich identity-related content details.
arXiv Detail & Related papers (2024-03-20T12:13:04Z) - PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved
Personalization [92.90392834835751]
PortraitBooth is designed for high efficiency, robust identity preservation, and expression-editable text-to-image generation.
PortraitBooth eliminates computational overhead and mitigates identity distortion.
It incorporates emotion-aware cross-attention control for diverse facial expressions in generated images.
arXiv Detail & Related papers (2023-12-11T13:03:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.