Text-to-Image Generation for Abstract Concepts
- URL: http://arxiv.org/abs/2309.14623v2
- Date: Wed, 27 Sep 2023 05:34:17 GMT
- Title: Text-to-Image Generation for Abstract Concepts
- Authors: Jiayi Liao, Xu Chen, Qiang Fu, Lun Du, Xiangnan He, Xiang Wang, Shi
Han, Dongmei Zhang
- Abstract summary: We propose a framework of Text-to-Image generation for Abstract Concepts (TIAC)
The abstract concept is clarified into a clear intent with a detailed definition to avoid ambiguity.
The concept-dependent form is retrieved from an LLM-extracted form pattern set.
- Score: 76.32278151607763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent years have witnessed the substantial progress of large-scale models
across various domains, such as natural language processing and computer
vision, facilitating the expression of concrete concepts. Unlike concrete
concepts that are usually directly associated with physical objects, expressing
abstract concepts through natural language requires considerable effort, which
results from their intricate semantics and connotations. An alternative
approach is to leverage images to convey rich visual information as a
supplement. Nevertheless, existing Text-to-Image (T2I) models are primarily
trained on concrete physical objects and tend to fail to visualize abstract
concepts. Inspired by the three-layer artwork theory that identifies critical
factors, intent, object and form during artistic creation, we propose a
framework of Text-to-Image generation for Abstract Concepts (TIAC). The
abstract concept is clarified into a clear intent with a detailed definition to
avoid ambiguity. LLMs then transform it into semantic-related physical objects,
and the concept-dependent form is retrieved from an LLM-extracted form pattern
set. Information from these three aspects will be integrated to generate
prompts for T2I models via LLM. Evaluation results from human assessments and
our newly designed metric concept score demonstrate the effectiveness of our
framework in creating images that can sufficiently express abstract concepts.
Related papers
- Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding [9.787025432074978]
This paper introduces Prompt for Abstract Concepts (POAC) to enhance the performance of text-to-image diffusion models.
We propose a Prompt Language Model (PLM), which is curated from a pre-trained language model, and then fine-tuned with a dataset of abstract concept prompts.
Our framework employs a Reinforcement Learning (RL)-based optimization strategy, focusing on the alignment between the generated images by a stable diffusion model and optimized prompts.
arXiv Detail & Related papers (2024-04-17T17:38:56Z) - Language-Informed Visual Concept Learning [22.911347501969857]
We train a set of concept encoders to encode the information pertinent to a set of language-informed concept axes.
We then anchor the concept embeddings to a set of text embeddings obtained from a pre-trained Visual Question Answering (VQA) model.
At inference time, the model extracts concept embeddings along various axes from new test images, which can be remixed to generate images with novel compositions of visual concepts.
arXiv Detail & Related papers (2023-12-06T16:24:47Z) - CLiC: Concept Learning in Context [54.81654147248919]
This paper builds upon recent advancements in visual concept learning.
It involves acquiring a visual concept from a source image and subsequently applying it to an object in a target image.
To localize the concept learning, we employ soft masks that contain both the concept within the mask and the surrounding image area.
arXiv Detail & Related papers (2023-11-28T01:33:18Z) - Create Your World: Lifelong Text-to-Image Diffusion [75.14353789007902]
We propose Lifelong text-to-image Diffusion Model (L2DM) to overcome knowledge "catastrophic forgetting" for the past encountered concepts.
In respect of knowledge "catastrophic forgetting", our L2DM framework devises a task-aware memory enhancement module and a elastic-concept distillation module.
Our model can generate more faithful image across a range of continual text prompts in terms of both qualitative and quantitative metrics.
arXiv Detail & Related papers (2023-09-08T16:45:56Z) - ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image
Diffusion Models [79.10890337599166]
We introduce ConceptBed, a large-scale dataset that consists of 284 unique visual concepts and 33K composite text prompts.
We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions.
Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome.
arXiv Detail & Related papers (2023-06-07T18:00:38Z) - The Hidden Language of Diffusion Models [70.03691458189604]
We present Conceptor, a novel method to interpret the internal representation of a textual concept by a diffusion model.
We find surprising visual connections between concepts, that transcend their textual semantics.
We additionally discover concepts that rely on mixtures of exemplars, biases, renowned artistic styles, or a simultaneous fusion of multiple meanings.
arXiv Detail & Related papers (2023-06-01T17:57:08Z) - Automatic Modeling of Social Concepts Evoked by Art Images as Multimodal
Frames [1.4502611532302037]
Social concepts referring to non-physical objects are powerful tools to describe, index, and query the content of visual data.
We propose a software approach to represent social concepts as multimodal frames, by integrating multisensory data.
Our method focuses on the extraction, analysis, and integration of multimodal features from visual art material tagged with the concepts of interest.
arXiv Detail & Related papers (2021-10-14T14:50:22Z) - Toward a Visual Concept Vocabulary for GAN Latent Space [74.12447538049537]
This paper introduces a new method for building open-ended vocabularies of primitive visual concepts represented in a GAN's latent space.
Our approach is built from three components: automatic identification of perceptually salient directions based on their layer selectivity; human annotation of these directions with free-form, compositional natural language descriptions.
Experiments show that concepts learned with our approach are reliable and composable -- generalizing across classes, contexts, and observers.
arXiv Detail & Related papers (2021-10-08T17:58:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.