Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image
Models
- URL: http://arxiv.org/abs/2307.06925v1
- Date: Thu, 13 Jul 2023 17:46:42 GMT
- Title: Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image
Models
- Authors: Moab Arar, Rinon Gal, Yuval Atzmon, Gal Chechik, Daniel Cohen-Or,
Ariel Shamir, Amit H. Bermano
- Abstract summary: Text-to-image (T2I) personalization allows users to combine their own visual concepts in natural language prompts.
Most existing encoders are limited to a single-class domain, which hinders their ability to handle diverse concepts.
We propose a domain-agnostic method that does not require any specialized dataset or prior information about the personalized concepts.
- Score: 59.094601993993535
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-image (T2I) personalization allows users to guide the creative image
generation process by combining their own visual concepts in natural language
prompts. Recently, encoder-based techniques have emerged as a new effective
approach for T2I personalization, reducing the need for multiple images and
long training times. However, most existing encoders are limited to a
single-class domain, which hinders their ability to handle diverse concepts. In
this work, we propose a domain-agnostic method that does not require any
specialized dataset or prior information about the personalized concepts. We
introduce a novel contrastive-based regularization technique to maintain high
fidelity to the target concept characteristics while keeping the predicted
embeddings close to editable regions of the latent space, by pushing the
predicted tokens toward their nearest existing CLIP tokens. Our experimental
results demonstrate the effectiveness of our approach and show how the learned
tokens are more semantic than tokens predicted by unregularized models. This
leads to a better representation that achieves state-of-the-art performance
while being more flexible than previous methods.
Related papers
- Tuning-Free Image Customization with Image and Text Guidance [65.9504243633169]
We introduce a tuning-free framework for simultaneous text-image-guided image customization.
Our approach preserves the semantic features of the reference image subject while allowing modification of detailed attributes based on text descriptions.
Our approach outperforms previous methods in both human and quantitative evaluations.
arXiv Detail & Related papers (2024-03-19T11:48:35Z) - Visual Concept-driven Image Generation with Text-to-Image Diffusion Model [65.96212844602866]
Text-to-image (TTI) models have demonstrated impressive results in generating high-resolution images of complex scenes.
Recent approaches have extended these methods with personalization techniques that allow them to integrate user-illustrated concepts.
However, the ability to generate images with multiple interacting concepts, such as human subjects, as well as concepts that may be entangled in one, or across multiple, image illustrations remains illusive.
We propose a concept-driven TTI personalization framework that addresses these core challenges.
arXiv Detail & Related papers (2024-02-18T07:28:37Z) - Textual Localization: Decomposing Multi-concept Images for
Subject-Driven Text-to-Image Generation [5.107886283951882]
We introduce a localized text-to-image model to handle multi-concept input images.
Our method incorporates a novel cross-attention guidance to decompose multiple concepts.
Notably, our method generates cross-attention maps consistent with the target concept in the generated images.
arXiv Detail & Related papers (2024-02-15T14:19:42Z) - Taming Encoder for Zero Fine-tuning Image Customization with
Text-to-Image Diffusion Models [55.04969603431266]
This paper proposes a method for generating images of customized objects specified by users.
The method is based on a general framework that bypasses the lengthy optimization required by previous approaches.
We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity.
arXiv Detail & Related papers (2023-04-05T17:59:32Z) - ELITE: Encoding Visual Concepts into Textual Embeddings for Customized
Text-to-Image Generation [59.44301617306483]
We propose a learning-based encoder for fast and accurate customized text-to-image generation.
Our method enables high-fidelity inversion and more robust editability with a significantly faster encoding process.
arXiv Detail & Related papers (2023-02-27T14:49:53Z) - Designing an Encoder for Fast Personalization of Text-to-Image Models [57.62449900121022]
We propose an encoder-based domain-tuning approach for text-to-image personalization.
We employ two components: First, an encoder that takes as an input a single image of a target concept from a given domain.
Second, a set of regularized weight-offsets for the text-to-image model that learn how to effectively ingest additional concepts.
arXiv Detail & Related papers (2023-02-23T18:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.