Create Your World: Lifelong Text-to-Image Diffusion
- URL: http://arxiv.org/abs/2309.04430v1
- Date: Fri, 8 Sep 2023 16:45:56 GMT
- Title: Create Your World: Lifelong Text-to-Image Diffusion
- Authors: Gan Sun, Wenqi Liang, Jiahua Dong, Jun Li, Zhengming Ding, Yang Cong
- Abstract summary: We propose Lifelong text-to-image Diffusion Model (L2DM) to overcome knowledge "catastrophic forgetting" for the past encountered concepts.
In respect of knowledge "catastrophic forgetting", our L2DM framework devises a task-aware memory enhancement module and a elastic-concept distillation module.
Our model can generate more faithful image across a range of continual text prompts in terms of both qualitative and quantitative metrics.
- Score: 75.14353789007902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image generative models can produce diverse high-quality images of
concepts with a text prompt, which have demonstrated excellent ability in image
generation, image translation, etc. We in this work study the problem of
synthesizing instantiations of a use's own concepts in a never-ending manner,
i.e., create your world, where the new concepts from user are quickly learned
with a few examples. To achieve this goal, we propose a Lifelong text-to-image
Diffusion Model (L2DM), which intends to overcome knowledge "catastrophic
forgetting" for the past encountered concepts, and semantic "catastrophic
neglecting" for one or more concepts in the text prompt. In respect of
knowledge "catastrophic forgetting", our L2DM framework devises a task-aware
memory enhancement module and a elastic-concept distillation module, which
could respectively safeguard the knowledge of both prior concepts and each past
personalized concept. When generating images with a user text prompt, the
solution to semantic "catastrophic neglecting" is that a concept attention
artist module can alleviate the semantic neglecting from concept aspect, and an
orthogonal attention module can reduce the semantic binding from attribute
aspect. To the end, our model can generate more faithful image across a range
of continual text prompts in terms of both qualitative and quantitative
metrics, when comparing with the related state-of-the-art models. The code will
be released at https://wenqiliang.github.io/.
Related papers
- Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning [0.0]
We propose a novel concept-erasure method that updates the text encoder using few-shot unlearning.
Our method can erase a concept within 10 s, making concept erasure more accessible than ever before.
arXiv Detail & Related papers (2024-05-12T14:01:05Z) - Attention Calibration for Disentangled Text-to-Image Personalization [12.339742346826403]
We propose an attention calibration mechanism to improve the concept-level understanding of the T2I model.
We demonstrate that our method outperforms the current state of the art in both qualitative and quantitative evaluations.
arXiv Detail & Related papers (2024-03-27T13:31:39Z) - Lego: Learning to Disentangle and Invert Concepts Beyond Object
Appearance in Text-to-Image Diffusion Models [66.43013001061477]
We introduce Lego, a method to invert subject entangled concepts from a few example images.
Lego disentangles concepts from their associated subjects using a simple yet effective Subject Separation step.
In a thorough user study, Lego-generated concepts were preferred over 70% of the time when compared to the baseline.
arXiv Detail & Related papers (2023-11-23T07:33:38Z) - An Image is Worth Multiple Words: Discovering Object Level Concepts using Multi-Concept Prompt Learning [8.985668637331335]
Textural Inversion learns a singular text embedding for a new "word" to represent image style and appearance.
We introduce Multi-Concept Prompt Learning (MCPL), where multiple unknown "words" are simultaneously learned from a single sentence-image pair.
Our approach emphasises learning solely from textual embeddings, using less than 10% of the storage space compared to others.
arXiv Detail & Related papers (2023-10-18T19:18:19Z) - Hypernymy Understanding Evaluation of Text-to-Image Models via WordNet
Hierarchy [12.82992353036576]
We measure the capability of popular text-to-image models to understand $textithypernymy$, or the "is-a" relation between words.
We show how our metrics can provide a better understanding of the individual strengths and weaknesses of popular text-to-image models.
arXiv Detail & Related papers (2023-10-13T16:53:25Z) - Break-A-Scene: Extracting Multiple Concepts from a Single Image [80.47666266017207]
We introduce the task of textual scene decomposition.
We propose augmenting the input image with masks that indicate the presence of target concepts.
We then present a novel two-phase customization process.
arXiv Detail & Related papers (2023-05-25T17:59:04Z) - Ablating Concepts in Text-to-Image Diffusion Models [57.9371041022838]
Large-scale text-to-image diffusion models can generate high-fidelity images with powerful compositional ability.
These models are typically trained on an enormous amount of Internet data, often containing copyrighted material, licensed images, and personal photos.
We propose an efficient method of ablating concepts in the pretrained model, preventing the generation of a target concept.
arXiv Detail & Related papers (2023-03-23T17:59:42Z) - Designing an Encoder for Fast Personalization of Text-to-Image Models [57.62449900121022]
We propose an encoder-based domain-tuning approach for text-to-image personalization.
We employ two components: First, an encoder that takes as an input a single image of a target concept from a given domain.
Second, a set of regularized weight-offsets for the text-to-image model that learn how to effectively ingest additional concepts.
arXiv Detail & Related papers (2023-02-23T18:46:41Z) - FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic
descriptions, and Conceptual Relations [99.54048050189971]
We present a framework for learning new visual concepts quickly, guided by multiple naturally occurring data streams.
The learned concepts support downstream applications, such as answering questions by reasoning about unseen images.
We demonstrate the effectiveness of our model on both synthetic and real-world datasets.
arXiv Detail & Related papers (2022-03-30T19:45:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.