CIGLI: Conditional Image Generation from Language & Image
- URL: http://arxiv.org/abs/2108.08955v1
- Date: Fri, 20 Aug 2021 00:58:42 GMT
- Title: CIGLI: Conditional Image Generation from Language & Image
- Authors: Xiaopeng Lu, Lynnette Ng, Jared Fernandez, Hao Zhu
- Abstract summary: We propose a new task called CIGLI: Conditional Image Generation from Language and Image.
Instead of generating an image based on text as in text-image generation, this task requires the generation of an image from a textual description and an image prompt.
- Score: 5.159265382427163
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multi-modal generation has been widely explored in recent years. Current
research directions involve generating text based on an image or vice versa. In
this paper, we propose a new task called CIGLI: Conditional Image Generation
from Language and Image. Instead of generating an image based on text as in
text-image generation, this task requires the generation of an image from a
textual description and an image prompt. We designed a new dataset to ensure
that the text description describes information from both images, and that
solely analyzing the description is insufficient to generate an image. We then
propose a novel language-image fusion model which improves the performance over
two established baseline methods, as evaluated by quantitative (automatic) and
qualitative (human) evaluations. The code and dataset is available at
https://github.com/vincentlux/CIGLI.
Related papers
- Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models [16.00576040281808]
We propose a novel framework called Image2Text2Image to evaluate image captioning models.
A high similarity score suggests that the model has produced a faithful textual description, while a low score highlights discrepancies.
Our framework does not rely on human-annotated captions reference, making it a valuable tool for assessing image captioning models.
arXiv Detail & Related papers (2024-11-08T17:07:01Z) - OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation [151.57313182844936]
We propose a new interleaved generation framework based on prompting large-language models (LLMs) and pre-trained text-to-image (T2I) models, namely OpenLEAF.
For model assessment, we first propose to use large multi-modal models (LMMs) to evaluate the entity and style consistencies of open-domain interleaved image-text sequences.
arXiv Detail & Related papers (2023-10-11T17:58:33Z) - Learning to Generate Semantic Layouts for Higher Text-Image
Correspondence in Text-to-Image Synthesis [37.32270579534541]
We propose a novel approach for enhancing text-image correspondence by leveraging available semantic layouts.
Our approach achieves higher text-image correspondence compared to existing text-to-image generation approaches in the Multi-Modal CelebA-HQ and the Cityscapes dataset.
arXiv Detail & Related papers (2023-08-16T05:59:33Z) - GlyphDiffusion: Text Generation as Image Generation [100.98428068214736]
We propose GlyphDiffusion, a novel diffusion approach for text generation via text-guided image generation.
Our key idea is to render the target text as a glyph image containing visual language content.
Our model also makes significant improvements compared to the recent diffusion model.
arXiv Detail & Related papers (2023-04-25T02:14:44Z) - Re-Imagen: Retrieval-Augmented Text-to-Image Generator [58.60472701831404]
Retrieval-Augmented Text-to-Image Generator (Re-Imagen)
Retrieval-Augmented Text-to-Image Generator (Re-Imagen)
arXiv Detail & Related papers (2022-09-29T00:57:28Z) - On Advances in Text Generation from Images Beyond Captioning: A Case
Study in Self-Rationalization [89.94078728495423]
We show that recent advances in each modality, CLIP image representations and scaling of language models, do not consistently improve multimodal self-rationalization of tasks with multimodal inputs.
Our findings call for a backbone modelling approach that can be built on to advance text generation from images and text beyond image captioning.
arXiv Detail & Related papers (2022-05-24T00:52:40Z) - Text as Neural Operator: Image Manipulation by Text Instruction [68.53181621741632]
In this paper, we study a setting that allows users to edit an image with multiple objects using complex text instructions to add, remove, or change the objects.
The inputs of the task are multimodal including (1) a reference image and (2) an instruction in natural language that describes desired modifications to the image.
We show that the proposed model performs favorably against recent strong baselines on three public datasets.
arXiv Detail & Related papers (2020-08-11T07:07:10Z) - Describe What to Change: A Text-guided Unsupervised Image-to-Image
Translation Approach [84.22327278486846]
We propose a novel unsupervised approach, based on image-to-image translation, that alters the attributes of a given image through a command-like sentence.
Our model disentangles the image content from the visual attributes, and it learns to modify the latter using the textual description.
Experiments show that the proposed model achieves promising performances on two large-scale public datasets.
arXiv Detail & Related papers (2020-08-10T15:40:05Z) - Text-Guided Neural Image Inpainting [20.551488941041256]
Inpainting task requires filling the corrupted image with contents coherent with the context.
The goal of this paper is to fill the semantic information in corrupted images according to the provided descriptive text.
We propose a novel inpainting model named Text-Guided Dual Attention Inpainting Network (TDANet)
arXiv Detail & Related papers (2020-04-07T09:04:43Z) - Multimodal Story Generation on Plural Images [8.293936347234126]
We propose to use images as input of the text generation model, called StoryGen.
The model demonstrates the ability to generate meaningful paragraphs of text containing the extracted features from the input images.
arXiv Detail & Related papers (2020-01-16T03:39:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.