Related papers: The Hidden Language of Diffusion Models

The Hidden Language of Diffusion Models

URL: http://arxiv.org/abs/2306.00966v3
Date: Thu, 5 Oct 2023 12:55:12 GMT
Title: The Hidden Language of Diffusion Models
Authors: Hila Chefer, Oran Lang, Mor Geva, Volodymyr Polosukhin, Assaf Shocher, Michal Irani, Inbar Mosseri, Lior Wolf
Abstract summary: We present Conceptor, a novel method to interpret the internal representation of a textual concept by a diffusion model. We find surprising visual connections between concepts, that transcend their textual semantics. We additionally discover concepts that rely on mixtures of exemplars, biases, renowned artistic styles, or a simultaneous fusion of multiple meanings.
Score: 70.03691458189604
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-to-image diffusion models have demonstrated an unparalleled ability to generate high-quality, diverse images from a textual prompt. However, the internal representations learned by these models remain an enigma. In this work, we present Conceptor, a novel method to interpret the internal representation of a textual concept by a diffusion model. This interpretation is obtained by decomposing the concept into a small set of human-interpretable textual elements. Applied over the state-of-the-art Stable Diffusion model, Conceptor reveals non-trivial structures in the representations of concepts. For example, we find surprising visual connections between concepts, that transcend their textual semantics. We additionally discover concepts that rely on mixtures of exemplars, biases, renowned artistic styles, or a simultaneous fusion of multiple meanings of the concept. Through a large battery of experiments, we demonstrate Conceptor's ability to provide meaningful, robust, and faithful decompositions for a wide variety of abstract, concrete, and complex textual concepts, while allowing to naturally connect each decomposition element to its corresponding visual impact on the generated images. Our code will be available at: https://hila-chefer.github.io/Conceptor/

Related papers

Blending Concepts with Text-to-Image Diffusion Models [48.68800153838679]
Diffusion models have advanced text-to-image generation in recent years, translating abstract concepts into high-fidelity images with remarkable ease.<n>In this work, we examine whether they can also blend distinct concepts, ranging from concrete objects to intangible ideas, into coherent new visual entities under a zero-shot framework.<n>We show that modern diffusion models indeed exhibit creative blending capabilities without further training or fine-tuning.
arXiv Detail & Related papers (2025-06-30T08:53:30Z)
OmniPrism: Learning Disentangled Visual Concept for Image Generation [57.21097864811521]
Creative visual concept generation often draws inspiration from specific concepts in a reference image to produce relevant outcomes. We propose OmniPrism, a visual concept disentangling approach for creative image generation. Our method learns disentangled concept representations guided by natural language and trains a diffusion model to incorporate these concepts.
arXiv Detail & Related papers (2024-12-16T18:59:52Z)
Scaling Concept With Text-Guided Diffusion Models [53.80799139331966]
Instead of replacing a concept, can we enhance or suppress the concept itself? We introduce ScalingConcept, a simple yet effective method to scale decomposed concepts up or down in real input without introducing new elements. More importantly, ScalingConcept enables a variety of novel zero-shot applications across image and audio domains.
arXiv Detail & Related papers (2024-10-31T17:09:55Z)
Compositional Entailment Learning for Hyperbolic Vision-Language Models [54.41927525264365]
We show how to fully leverage the innate hierarchical nature of hyperbolic embeddings by looking beyond individual image-text pairs. We propose Compositional Entailment Learning for hyperbolic vision-language models. Empirical evaluation on a hyperbolic vision-language model trained with millions of image-text pairs shows that the proposed compositional learning approach outperforms conventional Euclidean CLIP learning.
arXiv Detail & Related papers (2024-10-09T14:12:50Z)
CusConcept: Customized Visual Concept Decomposition with Diffusion Models [13.95568624067449]
We propose a two-stage framework, CusConcept, to extract customized visual concept embedding vectors. In the first stage, CusConcept employs a vocabularies-guided concept decomposition mechanism. In the second stage, joint concept refinement is performed to enhance the fidelity and quality of generated images.
arXiv Detail & Related papers (2024-10-01T04:41:44Z)
How to Blend Concepts in Diffusion Models [48.68800153838679]
Recent methods exploit multiple latent representations and their connection, making this research question even more entangled. Our goal is to understand how operations in the latent space affect the underlying concepts. Our conclusion is that concept blending through space manipulation is possible, although the best strategy depends on the context of the blend.
arXiv Detail & Related papers (2024-07-19T13:05:57Z)
Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models [60.80960965051388]
Adjectives and verbs are entangled with nouns (subject) Lego disentangles concepts from their associated subjects using a simple yet effective Subject Separation step. Lego-generated concepts were preferred over 70% of the time when compared to the baseline.
arXiv Detail & Related papers (2023-11-23T07:33:38Z)
FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic descriptions, and Conceptual Relations [99.54048050189971]
We present a framework for learning new visual concepts quickly, guided by multiple naturally occurring data streams. The learned concepts support downstream applications, such as answering questions by reasoning about unseen images. We demonstrate the effectiveness of our model on both synthetic and real-world datasets.
arXiv Detail & Related papers (2022-03-30T19:45:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.