Redefining  in Dictionary: Towards an Enhanced Semantic   Understanding of Creative Generation
        - URL: http://arxiv.org/abs/2410.24160v2
- Date: Wed, 20 Nov 2024 10:22:59 GMT
- Title: Redefining <Creative> in Dictionary: Towards an Enhanced Semantic   Understanding of Creative Generation
- Authors: Fu Feng, Yucheng Xie, Xu Yang, Jing Wang, Xin Geng, 
- Abstract summary: Current methods rely heavily on reference prompts or images to achieve a creative effect.
We introduce CreTok, which brings meta-creativity to diffusion models by redefining creative' as a new token.
CreTok achieves such redefinition by iteratively sampling diverse text pairs.
- Score: 39.93527514513576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   ``Creative'' remains an inherently abstract concept for both humans and diffusion models. While text-to-image (T2I) diffusion models can easily generate out-of-domain concepts like ``a blue banana'', they struggle with generating combinatorial objects such as ``a creative mixture that resembles a lettuce and a mantis'', due to difficulties in understanding the semantic depth of ``creative''. Current methods rely heavily on synthesizing reference prompts or images to achieve a creative effect, typically requiring retraining for each unique creative output -- a process that is computationally intensive and limits practical applications. To address this, we introduce CreTok, which brings meta-creativity to diffusion models by redefining ``creative'' as a new token, \texttt{<CreTok>}, thus enhancing models' semantic understanding for combinatorial creativity. CreTok achieves such redefinition by iteratively sampling diverse text pairs from our proposed CangJie dataset to form adaptive prompts and restrictive prompts, and then optimizing the similarity between their respective text embeddings. Extensive experiments demonstrate that \texttt{<CreTok>} enables the universal and direct generation of combinatorial creativity across diverse concepts without additional training (4s vs. BASS's 2400s per image), achieving state-of-the-art performance with improved text-image alignment ($\uparrow$0.03 in VQAScore) and higher human preference ratings ($\uparrow$0.009 in PickScore and $\uparrow$0.169 in ImageReward). Further evaluations with GPT-4o and user studies underscore CreTok's strengths in advancing creative generation. 
 
      
        Related papers
        - Blending Concepts with Text-to-Image Diffusion Models [48.68800153838679]
 Diffusion models have advanced text-to-image generation in recent years, translating abstract concepts into high-fidelity images with remarkable ease.<n>In this work, we examine whether they can also blend distinct concepts, ranging from concrete objects to intangible ideas, into coherent new visual entities under a zero-shot framework.<n>We show that modern diffusion models indeed exhibit creative blending capabilities without further training or fine-tuning.
 arXiv  Detail & Related papers  (2025-06-30T08:53:30Z)
- Distribution-Conditional Generation: From Class Distribution to Creative   Generation [39.93527514513576]
 DisTok is an encoder-decoder framework that maps class distributions into a latent space and decodes them into tokens of creative concept.<n>DisTok achieves state-of-the-art performance with superior text-image alignment and human preference scores.
 arXiv  Detail & Related papers  (2025-05-06T16:07:12Z)
- Consistent Subject Generation via Contrastive Instantiated Concepts [59.95616194326261]
 We introduce Contrastive Concept Instantiation (CoCoIns) to effectively synthesize consistent subjects across multiple independent creations.
The framework consists of a generative model and a mapping network, which transforms input latent codes into pseudo-words associated with certain instances of concepts.
 arXiv  Detail & Related papers  (2025-03-31T17:59:51Z)
- Scaling Concept With Text-Guided Diffusion Models [53.80799139331966]
 Instead of replacing a concept, can we enhance or suppress the concept itself?
We introduce ScalingConcept, a simple yet effective method to scale decomposed concepts up or down in real input without introducing new elements.
More importantly, ScalingConcept enables a variety of novel zero-shot applications across image and audio domains.
 arXiv  Detail & Related papers  (2024-10-31T17:09:55Z)
- ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model [73.95608242322949]
 Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images.
We present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion to address challenges such as misinterpreted styles and inconsistent semantics.
 arXiv  Detail & Related papers  (2024-05-24T07:19:40Z)
- Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept   Understanding [9.787025432074978]
 This paper introduces Prompt for Abstract Concepts (POAC) to enhance the performance of text-to-image diffusion models.
We propose a Prompt Language Model (PLM), which is curated from a pre-trained language model, and then fine-tuned with a dataset of abstract concept prompts.
Our framework employs a Reinforcement Learning (RL)-based optimization strategy, focusing on the alignment between the generated images by a stable diffusion model and optimized prompts.
 arXiv  Detail & Related papers  (2024-04-17T17:38:56Z)
- DiffChat: Learning to Chat with Text-to-Image Synthesis Models for
  Interactive Image Creation [40.478839423995296]
 We present DiffChat, a novel method to align Large Language Models (LLMs) to "chat" with prompt-as-input Text-to-Image Synthesis (TIS) models for interactive image creation.
Given a raw prompt/image and a user-specified instruction, DiffChat can effectively make appropriate modifications and generate the target prompt.
 arXiv  Detail & Related papers  (2024-03-08T02:24:27Z)
- DreamCreature: Crafting Photorealistic Virtual Creatures from
  Imagination [140.1641573781066]
 We introduce a novel task, Virtual Creatures Generation: Given a set of unlabeled images of the target concepts, we aim to train a T2I model capable of creating new, hybrid concepts.
We propose a new method called DreamCreature, which identifies and extracts the underlying sub-concepts.
The T2I thus adapts to generate novel concepts with faithful structures and photorealistic appearance.
 arXiv  Detail & Related papers  (2023-11-27T01:24:31Z)
- Spellburst: A Node-based Interface for Exploratory Creative Coding with
  Natural Language Prompts [7.074738009603178]
 Spellburst is a large language model (LLM) powered creative coding environment.
Spellburst allows artists to create generative art and explore variations through branching and merging operations.
 arXiv  Detail & Related papers  (2023-08-07T21:54:58Z)
- ConceptLab: Creative Concept Generation using VLM-Guided Diffusion Prior
  Constraints [56.824187892204314]
 We present the task of creative text-to-image generation, where we seek to generate new members of a broad category.
We show that the creative generation problem can be formulated as an optimization process over the output space of the diffusion prior.
We incorporate a question-answering Vision-Language Model (VLM) that adaptively adds new constraints to the optimization problem, encouraging the model to discover increasingly more unique creations.
 arXiv  Detail & Related papers  (2023-08-03T17:04:41Z)
- SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with
  Large Language Models [56.88192537044364]
 We propose a simple-yet-effective parameter-efficient fine-tuning approach called the Semantic Understanding and Reasoning adapter (SUR-adapter) for pre-trained diffusion models.
Our approach can make text-to-image diffusion models easier to use with better user experience.
 arXiv  Detail & Related papers  (2023-05-09T05:48:38Z)
- WordStylist: Styled Verbatim Handwritten Text Generation with Latent
  Diffusion Models [8.334487584550185]
 We present a latent diffusion-based method for styled text-to-text-content-image generation on word-level.
Our proposed method is able to generate realistic word image samples from different writer styles.
We show that the proposed model produces samples that are aesthetically pleasing, help boosting text recognition performance, and get similar writer retrieval score as real data.
 arXiv  Detail & Related papers  (2023-03-29T10:19:26Z)
- eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert
  Denoisers [87.52504764677226]
 Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis.
We train an ensemble of text-to-image diffusion models specialized for different stages synthesis.
Our ensemble of diffusion models, called eDiffi, results in improved text alignment while maintaining the same inference cost.
 arXiv  Detail & Related papers  (2022-11-02T17:43:04Z)
- CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for
  Image-Text Retrieval [108.48540976175457]
 We propose Coupled Diversity-Sensitive Momentum Constrastive Learning (CODER) for improving cross-modal representation.
We introduce dynamic dictionaries for both modalities to enlarge the scale of image-text pairs, and diversity-sensitiveness is achieved by adaptive negative pair weighting.
Experiments conducted on two popular benchmarks, i.e. MSCOCO and Flicker30K, validate CODER remarkably outperforms the state-of-the-art approaches.
 arXiv  Detail & Related papers  (2022-08-21T08:37:50Z)
- Explaining Creative Artifacts [69.86890599471202]
 We develop an inverse problem formulation to deconstruct the products of and compositional creativity into associative chains.
In particular, our formulation is structured as solving a traveling salesman problem through a knowledge graph of associative elements.
 arXiv  Detail & Related papers  (2020-10-14T14:32:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.