Create Your World: Lifelong Text-to-Image Diffusion
- URL: http://arxiv.org/abs/2309.04430v1
- Date: Fri, 8 Sep 2023 16:45:56 GMT
- Title: Create Your World: Lifelong Text-to-Image Diffusion
- Authors: Gan Sun, Wenqi Liang, Jiahua Dong, Jun Li, Zhengming Ding, Yang Cong
- Abstract summary: We propose Lifelong text-to-image Diffusion Model (L2DM) to overcome knowledge "catastrophic forgetting" for the past encountered concepts.
In respect of knowledge "catastrophic forgetting", our L2DM framework devises a task-aware memory enhancement module and a elastic-concept distillation module.
Our model can generate more faithful image across a range of continual text prompts in terms of both qualitative and quantitative metrics.
- Score: 75.14353789007902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image generative models can produce diverse high-quality images of
concepts with a text prompt, which have demonstrated excellent ability in image
generation, image translation, etc. We in this work study the problem of
synthesizing instantiations of a use's own concepts in a never-ending manner,
i.e., create your world, where the new concepts from user are quickly learned
with a few examples. To achieve this goal, we propose a Lifelong text-to-image
Diffusion Model (L2DM), which intends to overcome knowledge "catastrophic
forgetting" for the past encountered concepts, and semantic "catastrophic
neglecting" for one or more concepts in the text prompt. In respect of
knowledge "catastrophic forgetting", our L2DM framework devises a task-aware
memory enhancement module and a elastic-concept distillation module, which
could respectively safeguard the knowledge of both prior concepts and each past
personalized concept. When generating images with a user text prompt, the
solution to semantic "catastrophic neglecting" is that a concept attention
artist module can alleviate the semantic neglecting from concept aspect, and an
orthogonal attention module can reduce the semantic binding from attribute
aspect. To the end, our model can generate more faithful image across a range
of continual text prompts in terms of both qualitative and quantitative
metrics, when comparing with the related state-of-the-art models. The code will
be released at https://wenqiliang.github.io/.
Related papers
- Scaling Concept With Text-Guided Diffusion Models [53.80799139331966]
Instead of replacing a concept, can we enhance or suppress the concept itself?
We introduce ScalingConcept, a simple yet effective method to scale decomposed concepts up or down in real input without introducing new elements.
More importantly, ScalingConcept enables a variety of novel zero-shot applications across image and audio domains.
arXiv Detail & Related papers (2024-10-31T17:09:55Z) - How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization? [91.49559116493414]
We propose a novel Concept-Incremental text-to-image Diffusion Model (CIDM)
It can resolve catastrophic forgetting and concept neglect to learn new customization tasks in a concept-incremental manner.
Experiments validate that our CIDM surpasses existing custom diffusion models.
arXiv Detail & Related papers (2024-10-23T06:47:29Z) - Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning [0.0]
We propose a novel concept-erasure method that updates the text encoder using few-shot unlearning.
Our method can erase a concept within 10 s, making concept erasure more accessible than ever before.
arXiv Detail & Related papers (2024-05-12T14:01:05Z) - Attention Calibration for Disentangled Text-to-Image Personalization [12.339742346826403]
We propose an attention calibration mechanism to improve the concept-level understanding of the T2I model.
We demonstrate that our method outperforms the current state of the art in both qualitative and quantitative evaluations.
arXiv Detail & Related papers (2024-03-27T13:31:39Z) - An Image is Worth Multiple Words: Discovering Object Level Concepts using Multi-Concept Prompt Learning [8.985668637331335]
Textural Inversion learns a singular text embedding for a new "word" to represent image style and appearance.
We introduce Multi-Concept Prompt Learning (MCPL), where multiple unknown "words" are simultaneously learned from a single sentence-image pair.
Our approach emphasises learning solely from textual embeddings, using less than 10% of the storage space compared to others.
arXiv Detail & Related papers (2023-10-18T19:18:19Z) - Break-A-Scene: Extracting Multiple Concepts from a Single Image [80.47666266017207]
We introduce the task of textual scene decomposition.
We propose augmenting the input image with masks that indicate the presence of target concepts.
We then present a novel two-phase customization process.
arXiv Detail & Related papers (2023-05-25T17:59:04Z) - Ablating Concepts in Text-to-Image Diffusion Models [57.9371041022838]
Large-scale text-to-image diffusion models can generate high-fidelity images with powerful compositional ability.
These models are typically trained on an enormous amount of Internet data, often containing copyrighted material, licensed images, and personal photos.
We propose an efficient method of ablating concepts in the pretrained model, preventing the generation of a target concept.
arXiv Detail & Related papers (2023-03-23T17:59:42Z) - FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic
descriptions, and Conceptual Relations [99.54048050189971]
We present a framework for learning new visual concepts quickly, guided by multiple naturally occurring data streams.
The learned concepts support downstream applications, such as answering questions by reasoning about unseen images.
We demonstrate the effectiveness of our model on both synthetic and real-world datasets.
arXiv Detail & Related papers (2022-03-30T19:45:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.