The Hidden Language of Diffusion Models
- URL: http://arxiv.org/abs/2306.00966v3
- Date: Thu, 5 Oct 2023 12:55:12 GMT
- Title: The Hidden Language of Diffusion Models
- Authors: Hila Chefer, Oran Lang, Mor Geva, Volodymyr Polosukhin, Assaf Shocher,
Michal Irani, Inbar Mosseri, Lior Wolf
- Abstract summary: We present Conceptor, a novel method to interpret the internal representation of a textual concept by a diffusion model.
We find surprising visual connections between concepts, that transcend their textual semantics.
We additionally discover concepts that rely on mixtures of exemplars, biases, renowned artistic styles, or a simultaneous fusion of multiple meanings.
- Score: 70.03691458189604
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-image diffusion models have demonstrated an unparalleled ability to
generate high-quality, diverse images from a textual prompt. However, the
internal representations learned by these models remain an enigma. In this
work, we present Conceptor, a novel method to interpret the internal
representation of a textual concept by a diffusion model. This interpretation
is obtained by decomposing the concept into a small set of human-interpretable
textual elements. Applied over the state-of-the-art Stable Diffusion model,
Conceptor reveals non-trivial structures in the representations of concepts.
For example, we find surprising visual connections between concepts, that
transcend their textual semantics. We additionally discover concepts that rely
on mixtures of exemplars, biases, renowned artistic styles, or a simultaneous
fusion of multiple meanings of the concept. Through a large battery of
experiments, we demonstrate Conceptor's ability to provide meaningful, robust,
and faithful decompositions for a wide variety of abstract, concrete, and
complex textual concepts, while allowing to naturally connect each
decomposition element to its corresponding visual impact on the generated
images. Our code will be available at: https://hila-chefer.github.io/Conceptor/
Related papers
- Scaling Concept With Text-Guided Diffusion Models [53.80799139331966]
Instead of replacing a concept, can we enhance or suppress the concept itself?
We introduce ScalingConcept, a simple yet effective method to scale decomposed concepts up or down in real input without introducing new elements.
More importantly, ScalingConcept enables a variety of novel zero-shot applications across image and audio domains.
arXiv Detail & Related papers (2024-10-31T17:09:55Z) - Compositional Entailment Learning for Hyperbolic Vision-Language Models [54.41927525264365]
We show how to fully leverage the innate hierarchical nature of hyperbolic embeddings by looking beyond individual image-text pairs.
We propose Compositional Entailment Learning for hyperbolic vision-language models.
Empirical evaluation on a hyperbolic vision-language model trained with millions of image-text pairs shows that the proposed compositional learning approach outperforms conventional Euclidean CLIP learning.
arXiv Detail & Related papers (2024-10-09T14:12:50Z) - CusConcept: Customized Visual Concept Decomposition with Diffusion Models [13.95568624067449]
We propose a two-stage framework, CusConcept, to extract customized visual concept embedding vectors.
In the first stage, CusConcept employs a vocabularies-guided concept decomposition mechanism.
In the second stage, joint concept refinement is performed to enhance the fidelity and quality of generated images.
arXiv Detail & Related papers (2024-10-01T04:41:44Z) - How to Blend Concepts in Diffusion Models [48.68800153838679]
Recent methods exploit multiple latent representations and their connection, making this research question even more entangled.
Our goal is to understand how operations in the latent space affect the underlying concepts.
Our conclusion is that concept blending through space manipulation is possible, although the best strategy depends on the context of the blend.
arXiv Detail & Related papers (2024-07-19T13:05:57Z) - Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models [60.80960965051388]
Adjectives and verbs are entangled with nouns (subject)
Lego disentangles concepts from their associated subjects using a simple yet effective Subject Separation step.
Lego-generated concepts were preferred over 70% of the time when compared to the baseline.
arXiv Detail & Related papers (2023-11-23T07:33:38Z) - FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic
descriptions, and Conceptual Relations [99.54048050189971]
We present a framework for learning new visual concepts quickly, guided by multiple naturally occurring data streams.
The learned concepts support downstream applications, such as answering questions by reasoning about unseen images.
We demonstrate the effectiveness of our model on both synthetic and real-world datasets.
arXiv Detail & Related papers (2022-03-30T19:45:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.