Text-to-Image Generation for Abstract Concepts
- URL: http://arxiv.org/abs/2309.14623v2
- Date: Wed, 27 Sep 2023 05:34:17 GMT
- Title: Text-to-Image Generation for Abstract Concepts
- Authors: Jiayi Liao, Xu Chen, Qiang Fu, Lun Du, Xiangnan He, Xiang Wang, Shi
Han, Dongmei Zhang
- Abstract summary: We propose a framework of Text-to-Image generation for Abstract Concepts (TIAC)
The abstract concept is clarified into a clear intent with a detailed definition to avoid ambiguity.
The concept-dependent form is retrieved from an LLM-extracted form pattern set.
- Score: 76.32278151607763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent years have witnessed the substantial progress of large-scale models
across various domains, such as natural language processing and computer
vision, facilitating the expression of concrete concepts. Unlike concrete
concepts that are usually directly associated with physical objects, expressing
abstract concepts through natural language requires considerable effort, which
results from their intricate semantics and connotations. An alternative
approach is to leverage images to convey rich visual information as a
supplement. Nevertheless, existing Text-to-Image (T2I) models are primarily
trained on concrete physical objects and tend to fail to visualize abstract
concepts. Inspired by the three-layer artwork theory that identifies critical
factors, intent, object and form during artistic creation, we propose a
framework of Text-to-Image generation for Abstract Concepts (TIAC). The
abstract concept is clarified into a clear intent with a detailed definition to
avoid ambiguity. LLMs then transform it into semantic-related physical objects,
and the concept-dependent form is retrieved from an LLM-extracted form pattern
set. Information from these three aspects will be integrated to generate
prompts for T2I models via LLM. Evaluation results from human assessments and
our newly designed metric concept score demonstrate the effectiveness of our
framework in creating images that can sufficiently express abstract concepts.
Related papers
- MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models [51.1034358143232]
We introduce component-controllable personalization, a novel task that pushes the boundaries of text-to-image (T2I) models.
To overcome these challenges, we design MagicTailor, an innovative framework that leverages Dynamic Masked Degradation (DM-Deg) to dynamically perturb undesired visual semantics.
arXiv Detail & Related papers (2024-10-17T09:22:53Z) - From Concrete to Abstract: A Multimodal Generative Approach to Abstract Concept Learning [3.645603633040378]
This paper introduces a multimodal generative approach to high order abstract concept learning.
Our model initially grounds subordinate level concrete concepts, combines them to form basic level concepts, and finally abstracts to superordinate level concepts.
We evaluate the model language learning ability through language-to-visual and visual-to-language tests with high order abstract concepts.
arXiv Detail & Related papers (2024-10-03T10:24:24Z) - CusConcept: Customized Visual Concept Decomposition with Diffusion Models [13.95568624067449]
We propose a two-stage framework, CusConcept, to extract customized visual concept embedding vectors.
In the first stage, CusConcept employs a vocabularies-guided concept decomposition mechanism.
In the second stage, joint concept refinement is performed to enhance the fidelity and quality of generated images.
arXiv Detail & Related papers (2024-10-01T04:41:44Z) - What Makes a Maze Look Like a Maze? [92.80800000328277]
We introduce Deep Grounding (DSG), a framework that leverages explicit structured representations of visual abstractions for grounding and reasoning.
At the core of DSG are schemas--dependency graph descriptions of abstract concepts that decompose them into more primitive-level symbols.
We show that DSG significantly improves the abstract visual reasoning performance of vision-language models.
arXiv Detail & Related papers (2024-09-12T16:41:47Z) - Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding [9.787025432074978]
This paper introduces Prompt for Abstract Concepts (POAC) to enhance the performance of text-to-image diffusion models.
We propose a Prompt Language Model (PLM), which is curated from a pre-trained language model, and then fine-tuned with a dataset of abstract concept prompts.
Our framework employs a Reinforcement Learning (RL)-based optimization strategy, focusing on the alignment between the generated images by a stable diffusion model and optimized prompts.
arXiv Detail & Related papers (2024-04-17T17:38:56Z) - CLiC: Concept Learning in Context [54.81654147248919]
This paper builds upon recent advancements in visual concept learning.
It involves acquiring a visual concept from a source image and subsequently applying it to an object in a target image.
To localize the concept learning, we employ soft masks that contain both the concept within the mask and the surrounding image area.
arXiv Detail & Related papers (2023-11-28T01:33:18Z) - Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models [60.80960965051388]
Adjectives and verbs are entangled with nouns (subject)
Lego disentangles concepts from their associated subjects using a simple yet effective Subject Separation step.
Lego-generated concepts were preferred over 70% of the time when compared to the baseline.
arXiv Detail & Related papers (2023-11-23T07:33:38Z) - ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image
Diffusion Models [79.10890337599166]
We introduce ConceptBed, a large-scale dataset that consists of 284 unique visual concepts and 33K composite text prompts.
We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions.
Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome.
arXiv Detail & Related papers (2023-06-07T18:00:38Z) - Automatic Modeling of Social Concepts Evoked by Art Images as Multimodal
Frames [1.4502611532302037]
Social concepts referring to non-physical objects are powerful tools to describe, index, and query the content of visual data.
We propose a software approach to represent social concepts as multimodal frames, by integrating multisensory data.
Our method focuses on the extraction, analysis, and integration of multimodal features from visual art material tagged with the concepts of interest.
arXiv Detail & Related papers (2021-10-14T14:50:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.