An Image is Worth One Word: Personalizing Text-to-Image Generation using
Textual Inversion
- URL: http://arxiv.org/abs/2208.01618v1
- Date: Tue, 2 Aug 2022 17:50:36 GMT
- Title: An Image is Worth One Word: Personalizing Text-to-Image Generation using
Textual Inversion
- Authors: Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano,
Gal Chechik, Daniel Cohen-Or
- Abstract summary: Text-to-image models offer unprecedented freedom to guide creation through natural language.
Here we present a simple approach that allows such creative freedom.
We find evidence that a single word embedding is sufficient for capturing unique and varied concepts.
- Score: 60.05823240540769
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image models offer unprecedented freedom to guide creation through
natural language. Yet, it is unclear how such freedom can be exercised to
generate images of specific unique concepts, modify their appearance, or
compose them in new roles and novel scenes. In other words, we ask: how can we
use language-guided models to turn our cat into a painting, or imagine a new
product based on our favorite toy? Here we present a simple approach that
allows such creative freedom. Using only 3-5 images of a user-provided concept,
like an object or a style, we learn to represent it through new "words" in the
embedding space of a frozen text-to-image model. These "words" can be composed
into natural language sentences, guiding personalized creation in an intuitive
way. Notably, we find evidence that a single word embedding is sufficient for
capturing unique and varied concepts. We compare our approach to a wide range
of baselines, and demonstrate that it can more faithfully portray the concepts
across a range of applications and tasks.
Our code, data and new words will be available at:
https://textual-inversion.github.io
Related papers
- Training-free Editioning of Text-to-Image Models [47.32550822603952]
We propose a novel task, namely, training-free editioning, for text-to-image models.
We aim to create variations of a base text-to-image model without retraining.
Our proposed editioning paradigm enables a service provider to customize the base model into its "cat edition"
arXiv Detail & Related papers (2024-05-27T11:40:50Z) - Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models [60.80960965051388]
Adjectives and verbs are entangled with nouns (subject)
Lego disentangles concepts from their associated subjects using a simple yet effective Subject Separation step.
Lego-generated concepts were preferred over 70% of the time when compared to the baseline.
arXiv Detail & Related papers (2023-11-23T07:33:38Z) - An Image is Worth Multiple Words: Discovering Object Level Concepts using Multi-Concept Prompt Learning [8.985668637331335]
Textural Inversion learns a singular text embedding for a new "word" to represent image style and appearance.
We introduce Multi-Concept Prompt Learning (MCPL), where multiple unknown "words" are simultaneously learned from a single sentence-image pair.
Our approach emphasises learning solely from textual embeddings, using less than 10% of the storage space compared to others.
arXiv Detail & Related papers (2023-10-18T19:18:19Z) - Create Your World: Lifelong Text-to-Image Diffusion [75.14353789007902]
We propose Lifelong text-to-image Diffusion Model (L2DM) to overcome knowledge "catastrophic forgetting" for the past encountered concepts.
In respect of knowledge "catastrophic forgetting", our L2DM framework devises a task-aware memory enhancement module and a elastic-concept distillation module.
Our model can generate more faithful image across a range of continual text prompts in terms of both qualitative and quantitative metrics.
arXiv Detail & Related papers (2023-09-08T16:45:56Z) - Backdooring Textual Inversion for Concept Censorship [34.84218971929207]
This paper focuses on the personalization technique dubbed Textual Inversion (TI)
TI crafts the word embedding that contains detailed information about a specific object.
To achieve the concept censorship of a TI model, we propose injecting backdoors into the TI embeddings.
arXiv Detail & Related papers (2023-08-21T13:39:04Z) - Visually-Aware Context Modeling for News Image Captioning [54.31708859631821]
News Image Captioning aims to create captions from news articles and images.
We propose a face-naming module for learning better name embeddings.
We use CLIP to retrieve sentences that are semantically close to the image.
arXiv Detail & Related papers (2023-08-16T12:39:39Z) - Learning to Imagine: Visually-Augmented Natural Language Generation [73.65760028876943]
We propose a method to make pre-trained language models (PLMs) Learn to Imagine for Visuallyaugmented natural language gEneration.
We use a diffusion model to synthesize high-quality images conditioned on the input texts.
We conduct synthesis for each sentence rather than generate only one image for an entire paragraph.
arXiv Detail & Related papers (2023-05-26T13:59:45Z) - Multimodal Few-Shot Learning with Frozen Language Models [36.75551859968596]
We train a vision encoder to represent each image as a sequence of continuous embeddings, such that a pre-trained, frozen language model prompted with this prefix generates the appropriate caption.
The resulting system is a multimodal few-shot learner, with the surprising ability to learn a variety of new tasks when conditioned on examples.
arXiv Detail & Related papers (2021-06-25T21:07:09Z) - Describe What to Change: A Text-guided Unsupervised Image-to-Image
Translation Approach [84.22327278486846]
We propose a novel unsupervised approach, based on image-to-image translation, that alters the attributes of a given image through a command-like sentence.
Our model disentangles the image content from the visual attributes, and it learns to modify the latter using the textual description.
Experiments show that the proposed model achieves promising performances on two large-scale public datasets.
arXiv Detail & Related papers (2020-08-10T15:40:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.