Related papers: ELODIN: Naming Concepts in Embedding Spaces

ELODIN: Naming Concepts in Embedding Spaces

URL: http://arxiv.org/abs/2303.04001v2
Date: Thu, 9 Mar 2023 17:10:27 GMT
Title: ELODIN: Naming Concepts in Embedding Spaces
Authors: Rodrigo Mello, Filipe Calegario, Geber Ramalho
Abstract summary: We propose a method to enhance control by generating specific concepts that can be reused throughout multiple images. We perform a set of comparisons that finds our method to be a significant improvement over text-only prompts.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite recent advancements, the field of text-to-image synthesis still suffers from lack of fine-grained control. Using only text, it remains challenging to deal with issues such as concept coherence and concept contamination. We propose a method to enhance control by generating specific concepts that can be reused throughout multiple images, effectively expanding natural language with new words that can be combined much like a painter's palette. Unlike previous contributions, our method does not copy visuals from input data and can generate concepts through text alone. We perform a set of comparisons that finds our method to be a significant improvement over text-only prompts.

Related papers

One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing Framework [127.07102988701092]
We introduce the first text-image Collaborative Concept Erasing (Co-Erasing) framework.<n>Co-Erasing describes the concept jointly by text prompts and the corresponding undesirable images induced by the prompts.<n>We design a text-guided image concept refinement strategy that directs the model to focus on visual features most relevant to the specified text concept.
arXiv Detail & Related papers (2025-05-16T11:25:50Z)
Concept Lancet: Image Editing with Compositional Representation Transplant [58.9421919837084]
Concept Lancet is a zero-shot plug-and-play framework for principled representation manipulation in image editing. We decompose the source input in the latent (text embedding or diffusion score) space as a sparse linear combination of the representations of the collected visual concepts. We perform a customized concept transplant process to impose the corresponding editing direction.
arXiv Detail & Related papers (2025-04-03T17:59:58Z)
IP-Composer: Semantic Composition of Visual Concepts [49.18472621931207]
We present IP-Composer, a training-free approach for compositional image generation. Our method builds on IP-Adapter, which synthesizes novel images conditioned on an input image's CLIP embedding. We extend this approach to multiple visual inputs by crafting composite embeddings, stitched from the projections of multiple input images onto concept-specific CLIP-subspaces identified through text.
arXiv Detail & Related papers (2025-02-19T18:49:31Z)
Scaling Concept With Text-Guided Diffusion Models [53.80799139331966]
Instead of replacing a concept, can we enhance or suppress the concept itself? We introduce ScalingConcept, a simple yet effective method to scale decomposed concepts up or down in real input without introducing new elements. More importantly, ScalingConcept enables a variety of novel zero-shot applications across image and audio domains.
arXiv Detail & Related papers (2024-10-31T17:09:55Z)
CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization [14.01847471143144]
We introduce Context Regularization (CoRe), which enhances the learning of the new concept's text embedding by regularizing its context tokens in the prompt. CoRe can be applied to arbitrary prompts without requiring the generation of corresponding images. Comprehensive experiments demonstrate that our method outperforms several baseline methods in both identity preservation and text alignment.
arXiv Detail & Related papers (2024-08-28T16:27:58Z)
Non-confusing Generation of Customized Concepts in Diffusion Models [135.4385383284657]
We tackle the common challenge of inter-concept visual confusion in compositional concept generation using text-guided diffusion models (TGDMs) Existing customized generation methods only focus on fine-tuning the second stage while overlooking the first one. We propose a simple yet effective solution called CLIF: contrastive image-language fine-tuning.
arXiv Detail & Related papers (2024-05-11T05:01:53Z)
An Image is Worth Multiple Words: Discovering Object Level Concepts using Multi-Concept Prompt Learning [8.985668637331335]
Textural Inversion learns a singular text embedding for a new "word" to represent image style and appearance. We introduce Multi-Concept Prompt Learning (MCPL), where multiple unknown "words" are simultaneously learned from a single sentence-image pair. Our approach emphasises learning solely from textual embeddings, using less than 10% of the storage space compared to others.
arXiv Detail & Related papers (2023-10-18T19:18:19Z)
Multi-Concept T2I-Zero: Tweaking Only The Text Embeddings and Nothing Else [75.6806649860538]
We consider a more ambitious goal: natural multi-concept generation using a pre-trained diffusion model. We observe concept dominance and non-localized contribution that severely degrade multi-concept generation performance. We design a minimal low-cost solution that overcomes the above issues by tweaking the text embeddings for more realistic multi-concept text-to-image generation.
arXiv Detail & Related papers (2023-10-11T12:05:44Z)
Create Your World: Lifelong Text-to-Image Diffusion [75.14353789007902]
We propose Lifelong text-to-image Diffusion Model (L2DM) to overcome knowledge "catastrophic forgetting" for the past encountered concepts. In respect of knowledge "catastrophic forgetting", our L2DM framework devises a task-aware memory enhancement module and a elastic-concept distillation module. Our model can generate more faithful image across a range of continual text prompts in terms of both qualitative and quantitative metrics.
arXiv Detail & Related papers (2023-09-08T16:45:56Z)
General Image-to-Image Translation with One-Shot Image Guidance [5.89808526053682]
We propose a novel framework named visual concept translator (VCT) It has the ability to preserve content in the source image and translate the visual concepts guided by a single reference image. Given only one reference image, the proposed VCT can complete a wide range of general image-to-image translation tasks with excellent results.
arXiv Detail & Related papers (2023-07-20T16:37:49Z)
Break-A-Scene: Extracting Multiple Concepts from a Single Image [80.47666266017207]
We introduce the task of textual scene decomposition. We propose augmenting the input image with masks that indicate the presence of target concepts. We then present a novel two-phase customization process.
arXiv Detail & Related papers (2023-05-25T17:59:04Z)
ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation [59.44301617306483]
We propose a learning-based encoder for fast and accurate customized text-to-image generation. Our method enables high-fidelity inversion and more robust editability with a significantly faster encoding process.
arXiv Detail & Related papers (2023-02-27T14:49:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.