Redefining in Dictionary: Towards an Enhanced Semantic Understanding of Creative Generation
- URL: http://arxiv.org/abs/2410.24160v3
- Date: Mon, 17 Mar 2025 06:33:07 GMT
- Title: Redefining <Creative> in Dictionary: Towards an Enhanced Semantic Understanding of Creative Generation
- Authors: Fu Feng, Yucheng Xie, Xu Yang, Jing Wang, Xin Geng,
- Abstract summary: Creative'' remains an inherently abstract concept for both humans and diffusion models.<n>Current methods rely heavily on reference prompts or images to achieve a creative effect.<n>We introduce CreTok, which brings meta-creativity to diffusion models by redefining creative' as a new token, texttCreTok>.<n>Code will be made available at https://github.com/fu-feng/CreTok.
- Score: 39.93527514513576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: ``Creative'' remains an inherently abstract concept for both humans and diffusion models. While text-to-image (T2I) diffusion models can easily generate out-of-distribution concepts like ``a blue banana'', they struggle with generating combinatorial objects such as ``a creative mixture that resembles a lettuce and a mantis'', due to difficulties in understanding the semantic depth of ``creative''. Current methods rely heavily on synthesizing reference prompts or images to achieve a creative effect, typically requiring retraining for each unique creative output-a process that is computationally intensive and limits practical applications. To address this, we introduce CreTok, which brings meta-creativity to diffusion models by redefining ``creative'' as a new token, \texttt{<CreTok>}, thus enhancing models' semantic understanding for combinatorial creativity. CreTok achieves such redefinition by iteratively sampling diverse text pairs from our proposed CangJie dataset to form adaptive prompts and restrictive prompts, and then optimizing the similarity between their respective text embeddings. Extensive experiments demonstrate that <CreTok> enables the universal and direct generation of combinatorial creativity across diverse concepts without additional training, achieving state-of-the-art performance with improved text-image alignment and higher human preference ratings. Code will be made available at https://github.com/fu-feng/CreTok.
Related papers
- Consistent Subject Generation via Contrastive Instantiated Concepts [59.95616194326261]
We introduce Contrastive Concept Instantiation (CoCoIns) to effectively synthesize consistent subjects across multiple independent creations.
The framework consists of a generative model and a mapping network, which transforms input latent codes into pseudo-words associated with certain instances of concepts.
arXiv Detail & Related papers (2025-03-31T17:59:51Z) - Scaling Concept With Text-Guided Diffusion Models [53.80799139331966]
Instead of replacing a concept, can we enhance or suppress the concept itself?
We introduce ScalingConcept, a simple yet effective method to scale decomposed concepts up or down in real input without introducing new elements.
More importantly, ScalingConcept enables a variety of novel zero-shot applications across image and audio domains.
arXiv Detail & Related papers (2024-10-31T17:09:55Z) - ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model [73.95608242322949]
Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images.
We present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion to address challenges such as misinterpreted styles and inconsistent semantics.
arXiv Detail & Related papers (2024-05-24T07:19:40Z) - Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding [9.787025432074978]
This paper introduces Prompt for Abstract Concepts (POAC) to enhance the performance of text-to-image diffusion models.
We propose a Prompt Language Model (PLM), which is curated from a pre-trained language model, and then fine-tuned with a dataset of abstract concept prompts.
Our framework employs a Reinforcement Learning (RL)-based optimization strategy, focusing on the alignment between the generated images by a stable diffusion model and optimized prompts.
arXiv Detail & Related papers (2024-04-17T17:38:56Z) - DiffChat: Learning to Chat with Text-to-Image Synthesis Models for
Interactive Image Creation [40.478839423995296]
We present DiffChat, a novel method to align Large Language Models (LLMs) to "chat" with prompt-as-input Text-to-Image Synthesis (TIS) models for interactive image creation.
Given a raw prompt/image and a user-specified instruction, DiffChat can effectively make appropriate modifications and generate the target prompt.
arXiv Detail & Related papers (2024-03-08T02:24:27Z) - DreamCreature: Crafting Photorealistic Virtual Creatures from
Imagination [140.1641573781066]
We introduce a novel task, Virtual Creatures Generation: Given a set of unlabeled images of the target concepts, we aim to train a T2I model capable of creating new, hybrid concepts.
We propose a new method called DreamCreature, which identifies and extracts the underlying sub-concepts.
The T2I thus adapts to generate novel concepts with faithful structures and photorealistic appearance.
arXiv Detail & Related papers (2023-11-27T01:24:31Z) - Spellburst: A Node-based Interface for Exploratory Creative Coding with
Natural Language Prompts [7.074738009603178]
Spellburst is a large language model (LLM) powered creative coding environment.
Spellburst allows artists to create generative art and explore variations through branching and merging operations.
arXiv Detail & Related papers (2023-08-07T21:54:58Z) - ConceptLab: Creative Concept Generation using VLM-Guided Diffusion Prior
Constraints [56.824187892204314]
We present the task of creative text-to-image generation, where we seek to generate new members of a broad category.
We show that the creative generation problem can be formulated as an optimization process over the output space of the diffusion prior.
We incorporate a question-answering Vision-Language Model (VLM) that adaptively adds new constraints to the optimization problem, encouraging the model to discover increasingly more unique creations.
arXiv Detail & Related papers (2023-08-03T17:04:41Z) - SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with
Large Language Models [56.88192537044364]
We propose a simple-yet-effective parameter-efficient fine-tuning approach called the Semantic Understanding and Reasoning adapter (SUR-adapter) for pre-trained diffusion models.
Our approach can make text-to-image diffusion models easier to use with better user experience.
arXiv Detail & Related papers (2023-05-09T05:48:38Z) - WordStylist: Styled Verbatim Handwritten Text Generation with Latent
Diffusion Models [8.334487584550185]
We present a latent diffusion-based method for styled text-to-text-content-image generation on word-level.
Our proposed method is able to generate realistic word image samples from different writer styles.
We show that the proposed model produces samples that are aesthetically pleasing, help boosting text recognition performance, and get similar writer retrieval score as real data.
arXiv Detail & Related papers (2023-03-29T10:19:26Z) - eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert
Denoisers [87.52504764677226]
Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis.
We train an ensemble of text-to-image diffusion models specialized for different stages synthesis.
Our ensemble of diffusion models, called eDiffi, results in improved text alignment while maintaining the same inference cost.
arXiv Detail & Related papers (2022-11-02T17:43:04Z) - CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for
Image-Text Retrieval [108.48540976175457]
We propose Coupled Diversity-Sensitive Momentum Constrastive Learning (CODER) for improving cross-modal representation.
We introduce dynamic dictionaries for both modalities to enlarge the scale of image-text pairs, and diversity-sensitiveness is achieved by adaptive negative pair weighting.
Experiments conducted on two popular benchmarks, i.e. MSCOCO and Flicker30K, validate CODER remarkably outperforms the state-of-the-art approaches.
arXiv Detail & Related papers (2022-08-21T08:37:50Z) - Explaining Creative Artifacts [69.86890599471202]
We develop an inverse problem formulation to deconstruct the products of and compositional creativity into associative chains.
In particular, our formulation is structured as solving a traveling salesman problem through a knowledge graph of associative elements.
arXiv Detail & Related papers (2020-10-14T14:32:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.