Related papers: Semantic Draw Engineering for Text-to-Image Creation

Semantic Draw Engineering for Text-to-Image Creation

URL: http://arxiv.org/abs/2401.04116v1
Date: Sat, 23 Dec 2023 05:35:15 GMT
Title: Semantic Draw Engineering for Text-to-Image Creation
Authors: Yang Li and Huaqiang Jiang and Yangkai Wu
Abstract summary: We propose a method that utilizes artificial intelligence models for thematic creativity. The method involves converting all visual elements into quantifiable data structures before creating images. We evaluate the effectiveness of this approach in terms of semantic accuracy, image efficiency, and computational efficiency.
Score: 2.615648035076649
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Text-to-image generation is conducted through Generative Adversarial Networks (GANs) or transformer models. However, the current challenge lies in accurately generating images based on textual descriptions, especially in scenarios where the content and theme of the target image are ambiguous. In this paper, we propose a method that utilizes artificial intelligence models for thematic creativity, followed by a classification modeling of the actual painting process. The method involves converting all visual elements into quantifiable data structures before creating images. We evaluate the effectiveness of this approach in terms of semantic accuracy, image reproducibility, and computational efficiency, in comparison with existing image generation algorithms.

Related papers

EDITOR: Effective and Interpretable Prompt Inversion for Text-to-Image Diffusion Models [31.31018600797305]
We propose a prompt inversion technique called sys for text-to-image diffusion models.<n>Our method outperforms existing methods in terms of image similarity, textual alignment, prompt interpretability and generalizability.
arXiv Detail & Related papers (2025-06-03T16:44:15Z)
Prompt-Consistency Image Generation (PCIG): A Unified Framework Integrating LLMs, Knowledge Graphs, and Controllable Diffusion Models [20.19571676239579]
We introduce a novel diffusion-based framework to enhance the alignment of generated images with their corresponding descriptions. Our framework is built upon a comprehensive analysis of inconsistency phenomena, categorizing them based on their manifestation in the image. We then integrate a state-of-the-art controllable image generation model with a visual text generation module to generate an image that is consistent with the original prompt.
arXiv Detail & Related papers (2024-06-24T06:12:16Z)
Text Guided Image Editing with Automatic Concept Locating and Forgetting [27.70615803908037]
We propose a novel method called Locate and Forget (LaF) to locate potential target concepts in the image for modification. Compared to the baselines, our method demonstrates its superiority in text-guided image editing tasks both qualitatively and quantitatively.
arXiv Detail & Related papers (2024-05-30T05:36:32Z)
ITI-GEN: Inclusive Text-to-Image Generation [56.72212367905351]
This study investigates inclusive text-to-image generative models that generate images based on human-written prompts. We show that, for some attributes, images can represent concepts more expressively than text. We propose a novel approach, ITI-GEN, that leverages readily available reference images for Inclusive Text-to-Image GENeration.
arXiv Detail & Related papers (2023-09-11T15:54:30Z)
Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models [80.75258849913574]
In this paper, we consider the inverse problem -- given a collection of different images, can we discover the generative concepts that represent each image? We present an unsupervised approach to discover generative concepts from a collection of images, disentangling different art styles in paintings, objects, and lighting from kitchen scenes, and discovering image classes given ImageNet images.
arXiv Detail & Related papers (2023-06-08T17:02:15Z)
Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models [55.04969603431266]
This paper proposes a method for generating images of customized objects specified by users. The method is based on a general framework that bypasses the lengthy optimization required by previous approaches. We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity.
arXiv Detail & Related papers (2023-04-05T17:59:32Z)
Object-Centric Relational Representations for Image Generation [18.069747511100132]
This paper explores a novel method to condition image generation, based on object-centric relational representations. We show that such architectural biases entail properties that facilitate the manipulation and conditioning of the generative process. We also propose a novel benchmark for image generation consisting of a synthetic dataset of images paired with their relational representation.
arXiv Detail & Related papers (2023-03-26T11:17:17Z)
Localizing Object-level Shape Variations with Text-to-Image Diffusion Models [60.422435066544814]
We present a technique to generate a collection of images that depicts variations in the shape of a specific object. A particular challenge when generating object variations is accurately localizing the manipulation applied over the object's shape. To localize the image-space operation, we present two techniques that use the self-attention layers in conjunction with the cross-attention layers.
arXiv Detail & Related papers (2023-03-20T17:45:08Z)
Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation [10.39028769374367]
We present a new framework that takes text-to-image synthesis to the realm of image-to-image translation. Our method harnesses the power of a pre-trained text-to-image diffusion model to generate a new image that complies with the target text.
arXiv Detail & Related papers (2022-11-22T20:39:18Z)
Language Does More Than Describe: On The Lack Of Figurative Speech in Text-To-Image Models [63.545146807810305]
Text-to-image diffusion models can generate high-quality pictures from textual input prompts. These models have been trained using text data collected from content-based labelling protocols. We characterise the sentimentality, objectiveness and degree of abstraction of publicly available text data used to train current text-to-image diffusion models.
arXiv Detail & Related papers (2022-10-19T14:20:05Z)
Improving Generation and Evaluation of Visual Stories via Semantic Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions. Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task. We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z)
Words as Art Materials: Generating Paintings with Sequential GANs [8.249180979158815]
We investigate the generation of artistic images on a large variance dataset. This dataset includes images with variations, for example, in shape, color, and content. We propose a sequential Generative Adversarial Network model.
arXiv Detail & Related papers (2020-07-08T19:17:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.