Transferring Visual Attributes from Natural Language to Verified Image
Generation
- URL: http://arxiv.org/abs/2305.15026v2
- Date: Mon, 29 May 2023 09:34:31 GMT
- Title: Transferring Visual Attributes from Natural Language to Verified Image
Generation
- Authors: Rodrigo Valerio, Joao Bordalo, Michal Yarom, Yonatan Bitton, Idan
Szpektor, Joao Magalhaes
- Abstract summary: We propose a Natural Language to Verified Image generation approach (NL2VI) that converts a natural prompt into a visual prompt.
A T2I model then generates an image for the visual prompt, which is then verified with VQA algorithms.
Experiments show that aligning natural prompts with image generation can improve the consistency of the generated images by up to 11% over the state of the art.
- Score: 4.834625048634076
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text to image generation methods (T2I) are widely popular in generating art
and other creative artifacts. While visual hallucinations can be a positive
factor in scenarios where creativity is appreciated, such artifacts are poorly
suited for cases where the generated image needs to be grounded in complex
natural language without explicit visual elements. In this paper, we propose to
strengthen the consistency property of T2I methods in the presence of natural
complex language, which often breaks the limits of T2I methods by including
non-visual information, and textual elements that require knowledge for
accurate generation. To address these phenomena, we propose a Natural Language
to Verified Image generation approach (NL2VI) that converts a natural prompt
into a visual prompt, which is more suitable for image generation. A T2I model
then generates an image for the visual prompt, which is then verified with VQA
algorithms. Experimentally, aligning natural prompts with image generation can
improve the consistency of the generated images by up to 11% over the state of
the art. Moreover, improvements can generalize to challenging domains like
cooking and DIY tasks, where the correctness of the generated image is crucial
to illustrate actions.
Related papers
- Bringing Characters to New Stories: Training-Free Theme-Specific Image Generation via Dynamic Visual Prompting [71.29100512700064]
We present T-Prompter, a training-free method for theme-specific image generation.
T-Prompter integrates reference images into generative models, allowing users to seamlessly specify the target theme.
Our approach enables consistent story generation, character design, realistic character generation, and style-guided image generation.
arXiv Detail & Related papers (2025-01-26T19:01:19Z) - SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild [55.619708995575785]
The text in natural scene images needs to meet the following four key criteria.
The generated text can facilitate to the training of natural scene OCR (Optical Character Recognition) tasks.
The generated images have superior utility in OCR tasks like text detection and text recognition.
arXiv Detail & Related papers (2025-01-06T12:09:08Z) - Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models [54.052963634384945]
We introduce the Image Regeneration task to assess text-to-image models.
We use GPT4V to bridge the gap between the reference image and the text input for the T2I model.
We also present ImageRepainter framework to enhance the quality of generated images.
arXiv Detail & Related papers (2024-11-14T13:52:43Z) - Text Image Generation for Low-Resource Languages with Dual Translation Learning [0.0]
This study proposes a novel approach that generates text images in low-resource languages by emulating the style of real text images from high-resource languages.
The training of this model involves dual translation tasks, where it transforms plain text images into either synthetic or real text images.
To enhance the accuracy and variety of generated text images, we introduce two guidance techniques.
arXiv Detail & Related papers (2024-09-26T11:23:59Z) - Visual Text Generation in the Wild [67.37458807253064]
We propose a visual text generator (termed SceneVTG) which can produce high-quality text images in the wild.
The proposed SceneVTG significantly outperforms traditional rendering-based methods and recent diffusion-based methods in terms of fidelity and reasonability.
The generated images provide superior utility for tasks involving text detection and text recognition.
arXiv Detail & Related papers (2024-07-19T09:08:20Z) - Mini-DALLE3: Interactive Text to Image by Prompting Large Language
Models [71.49054220807983]
A prevalent limitation persists in the effective communication with T2I models, such as Stable Diffusion, using natural language descriptions.
Inspired by the recently released DALLE3, we revisit the existing T2I systems endeavoring to align human intent and introduce a new task - interactive text to image (iT2I)
We present a simple approach that augments LLMs for iT2I with prompting techniques and off-the-shelf T2I models.
arXiv Detail & Related papers (2023-10-11T16:53:40Z) - GlyphDiffusion: Text Generation as Image Generation [100.98428068214736]
We propose GlyphDiffusion, a novel diffusion approach for text generation via text-guided image generation.
Our key idea is to render the target text as a glyph image containing visual language content.
Our model also makes significant improvements compared to the recent diffusion model.
arXiv Detail & Related papers (2023-04-25T02:14:44Z) - Plug-and-Play Diffusion Features for Text-Driven Image-to-Image
Translation [10.39028769374367]
We present a new framework that takes text-to-image synthesis to the realm of image-to-image translation.
Our method harnesses the power of a pre-trained text-to-image diffusion model to generate a new image that complies with the target text.
arXiv Detail & Related papers (2022-11-22T20:39:18Z) - Re-Imagen: Retrieval-Augmented Text-to-Image Generator [58.60472701831404]
Retrieval-Augmented Text-to-Image Generator (Re-Imagen)
Retrieval-Augmented Text-to-Image Generator (Re-Imagen)
arXiv Detail & Related papers (2022-09-29T00:57:28Z) - Text-to-Face Generation with StyleGAN2 [0.0]
We propose a novel framework, to generate facial images that are well-aligned with the input descriptions.
Our framework utilizes the high-resolution face generator, StyleGAN2, and explores the possibility of using it in T2F.
The images generated exhibit a 57% similarity to the ground truth images, with a face semantic distance of 0.92, outperforming state-of-the-artwork.
arXiv Detail & Related papers (2022-05-25T06:02:01Z) - Text to Image Generation with Semantic-Spatial Aware GAN [41.73685713621705]
A text to image generation (T2I) model aims to generate photo-realistic images which are semantically consistent with the text descriptions.
We propose a novel framework Semantic-Spatial Aware GAN, which is trained in an end-to-end fashion so that the text encoder can exploit better text information.
arXiv Detail & Related papers (2021-04-01T15:48:01Z) - Text as Neural Operator: Image Manipulation by Text Instruction [68.53181621741632]
In this paper, we study a setting that allows users to edit an image with multiple objects using complex text instructions to add, remove, or change the objects.
The inputs of the task are multimodal including (1) a reference image and (2) an instruction in natural language that describes desired modifications to the image.
We show that the proposed model performs favorably against recent strong baselines on three public datasets.
arXiv Detail & Related papers (2020-08-11T07:07:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.