Related papers: Text-to-Image Generation for Abstract Concepts

Text-to-Image Generation for Abstract Concepts

URL: http://arxiv.org/abs/2309.14623v2
Date: Wed, 27 Sep 2023 05:34:17 GMT
Title: Text-to-Image Generation for Abstract Concepts
Authors: Jiayi Liao, Xu Chen, Qiang Fu, Lun Du, Xiangnan He, Xiang Wang, Shi Han, Dongmei Zhang
Abstract summary: We propose a framework of Text-to-Image generation for Abstract Concepts (TIAC) The abstract concept is clarified into a clear intent with a detailed definition to avoid ambiguity. The concept-dependent form is retrieved from an LLM-extracted form pattern set.
Score: 76.32278151607763
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent years have witnessed the substantial progress of large-scale models across various domains, such as natural language processing and computer vision, facilitating the expression of concrete concepts. Unlike concrete concepts that are usually directly associated with physical objects, expressing abstract concepts through natural language requires considerable effort, which results from their intricate semantics and connotations. An alternative approach is to leverage images to convey rich visual information as a supplement. Nevertheless, existing Text-to-Image (T2I) models are primarily trained on concrete physical objects and tend to fail to visualize abstract concepts. Inspired by the three-layer artwork theory that identifies critical factors, intent, object and form during artistic creation, we propose a framework of Text-to-Image generation for Abstract Concepts (TIAC). The abstract concept is clarified into a clear intent with a detailed definition to avoid ambiguity. LLMs then transform it into semantic-related physical objects, and the concept-dependent form is retrieved from an LLM-extracted form pattern set. Information from these three aspects will be integrated to generate prompts for T2I models via LLM. Evaluation results from human assessments and our newly designed metric concept score demonstrate the effectiveness of our framework in creating images that can sufficiently express abstract concepts.

Related papers

OmniPrism: Learning Disentangled Visual Concept for Image Generation [57.21097864811521]
Creative visual concept generation often draws inspiration from specific concepts in a reference image to produce relevant outcomes. We propose OmniPrism, a visual concept disentangling approach for creative image generation. Our method learns disentangled concept representations guided by natural language and trains a diffusion model to incorporate these concepts.
arXiv Detail & Related papers (2024-12-16T18:59:52Z)
MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models [51.1034358143232]
We introduce component-controllable personalization, a novel task that pushes the boundaries of text-to-image (T2I) models. To overcome these challenges, we design MagicTailor, an innovative framework that leverages Dynamic Masked Degradation (DM-Deg) to dynamically perturb undesired visual semantics.
arXiv Detail & Related papers (2024-10-17T09:22:53Z)
From Concrete to Abstract: A Multimodal Generative Approach to Abstract Concept Learning [3.645603633040378]
This paper introduces a multimodal generative approach to high order abstract concept learning. Our model initially grounds subordinate level concrete concepts, combines them to form basic level concepts, and finally abstracts to superordinate level concepts. We evaluate the model language learning ability through language-to-visual and visual-to-language tests with high order abstract concepts.
arXiv Detail & Related papers (2024-10-03T10:24:24Z)
CusConcept: Customized Visual Concept Decomposition with Diffusion Models [13.95568624067449]
We propose a two-stage framework, CusConcept, to extract customized visual concept embedding vectors. In the first stage, CusConcept employs a vocabularies-guided concept decomposition mechanism. In the second stage, joint concept refinement is performed to enhance the fidelity and quality of generated images.
arXiv Detail & Related papers (2024-10-01T04:41:44Z)
What Makes a Maze Look Like a Maze? [92.80800000328277]
We introduce Deep Grounding (DSG), a framework that leverages explicit structured representations of visual abstractions for grounding and reasoning. At the core of DSG are schemas--dependency graph descriptions of abstract concepts that decompose them into more primitive-level symbols. We show that DSG significantly improves the abstract visual reasoning performance of vision-language models.
arXiv Detail & Related papers (2024-09-12T16:41:47Z)
Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding [9.787025432074978]
This paper introduces Prompt for Abstract Concepts (POAC) to enhance the performance of text-to-image diffusion models. We propose a Prompt Language Model (PLM), which is curated from a pre-trained language model, and then fine-tuned with a dataset of abstract concept prompts. Our framework employs a Reinforcement Learning (RL)-based optimization strategy, focusing on the alignment between the generated images by a stable diffusion model and optimized prompts.
arXiv Detail & Related papers (2024-04-17T17:38:56Z)
CLiC: Concept Learning in Context [54.81654147248919]
This paper builds upon recent advancements in visual concept learning. It involves acquiring a visual concept from a source image and subsequently applying it to an object in a target image. To localize the concept learning, we employ soft masks that contain both the concept within the mask and the surrounding image area.
arXiv Detail & Related papers (2023-11-28T01:33:18Z)
Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models [60.80960965051388]
Adjectives and verbs are entangled with nouns (subject) Lego disentangles concepts from their associated subjects using a simple yet effective Subject Separation step. Lego-generated concepts were preferred over 70% of the time when compared to the baseline.
arXiv Detail & Related papers (2023-11-23T07:33:38Z)
ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models [79.10890337599166]
We introduce ConceptBed, a large-scale dataset that consists of 284 unique visual concepts and 33K composite text prompts. We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions. Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome.
arXiv Detail & Related papers (2023-06-07T18:00:38Z)
Automatic Modeling of Social Concepts Evoked by Art Images as Multimodal Frames [1.4502611532302037]
Social concepts referring to non-physical objects are powerful tools to describe, index, and query the content of visual data. We propose a software approach to represent social concepts as multimodal frames, by integrating multisensory data. Our method focuses on the extraction, analysis, and integration of multimodal features from visual art material tagged with the concepts of interest.
arXiv Detail & Related papers (2021-10-14T14:50:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.