DT2I: Dense Text-to-Image Generation from Region Descriptions
- URL: http://arxiv.org/abs/2204.02035v1
- Date: Tue, 5 Apr 2022 07:57:11 GMT
- Title: DT2I: Dense Text-to-Image Generation from Region Descriptions
- Authors: Stanislav Frolov, Prateek Bansal, J\"orn Hees, Andreas Dengel
- Abstract summary: We introduce dense text-to-image (DT2I) synthesis as a new task to pave the way toward more intuitive image generation.
We also propose DTC-GAN, a novel method to generate images from semantically rich region descriptions.
- Score: 3.883984493622102
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite astonishing progress, generating realistic images of complex scenes
remains a challenging problem. Recently, layout-to-image synthesis approaches
have attracted much interest by conditioning the generator on a list of
bounding boxes and corresponding class labels. However, previous approaches are
very restrictive because the set of labels is fixed a priori. Meanwhile,
text-to-image synthesis methods have substantially improved and provide a
flexible way for conditional image generation. In this work, we introduce dense
text-to-image (DT2I) synthesis as a new task to pave the way toward more
intuitive image generation. Furthermore, we propose DTC-GAN, a novel method to
generate images from semantically rich region descriptions, and a multi-modal
region feature matching loss to encourage semantic image-text matching. Our
results demonstrate the capability of our approach to generate plausible images
of complex scenes using region captions.
Related papers
- Style Generation: Image Synthesis based on Coarsely Matched Texts [10.939482612568433]
We introduce a novel task called text-based style generation and propose a two-stage generative adversarial network.
The first stage generates the overall image style with a sentence feature, and the second stage refines the generated style with a synthetic feature.
The practical potential of our work is demonstrated by various applications such as text-image alignment and story visualization.
arXiv Detail & Related papers (2023-09-08T21:51:11Z) - LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image
Generation [121.45667242282721]
We propose a coarse-to-fine paradigm to achieve layout planning and image generation.
Our proposed method outperforms the state-of-the-art models in terms of photorealistic layout and image generation.
arXiv Detail & Related papers (2023-08-09T17:45:04Z) - Fine-grained Cross-modal Fusion based Refinement for Text-to-Image
Synthesis [12.954663420736782]
We propose a novel Fine-grained text-image Fusion based Generative Adversarial Networks, dubbed FF-GAN.
The FF-GAN consists of two modules: Fine-grained text-image Fusion Block (FF-Block) and Global Semantic Refinement (GSR)
arXiv Detail & Related papers (2023-02-17T05:44:05Z) - SceneComposer: Any-Level Semantic Image Synthesis [80.55876413285587]
We propose a new framework for conditional image synthesis from semantic layouts of any precision levels.
The framework naturally reduces to text-to-image (T2I) at the lowest level with no shape information, and it becomes segmentation-to-image (S2I) at the highest level.
We introduce several novel techniques to address the challenges coming with this new setup.
arXiv Detail & Related papers (2022-11-21T18:59:05Z) - FlexIT: Towards Flexible Semantic Image Translation [59.09398209706869]
We propose FlexIT, a novel method which can take any input image and a user-defined text instruction for editing.
First, FlexIT combines the input image and text into a single target point in the CLIP multimodal embedding space.
We iteratively transform the input image toward the target point, ensuring coherence and quality with a variety of novel regularization terms.
arXiv Detail & Related papers (2022-03-09T13:34:38Z) - OptGAN: Optimizing and Interpreting the Latent Space of the Conditional
Text-to-Image GANs [8.26410341981427]
We study how to ensure that generated samples are believable, realistic or natural.
We present a novel algorithm which identifies semantically-understandable directions in the latent space of a conditional text-to-image GAN architecture.
arXiv Detail & Related papers (2022-02-25T20:00:33Z) - DAE-GAN: Dynamic Aspect-aware GAN for Text-to-Image Synthesis [55.788772366325105]
We propose a Dynamic Aspect-awarE GAN (DAE-GAN) that represents text information comprehensively from multiple granularities, including sentence-level, word-level, and aspect-level.
Inspired by human learning behaviors, we develop a novel Aspect-aware Dynamic Re-drawer (ADR) for image refinement, in which an Attended Global Refinement (AGR) module and an Aspect-aware Local Refinement (ALR) module are alternately employed.
arXiv Detail & Related papers (2021-08-27T07:20:34Z) - Towards Open-World Text-Guided Face Image Generation and Manipulation [52.83401421019309]
We propose a unified framework for both face image generation and manipulation.
Our method supports open-world scenarios, including both image and text, without any re-training, fine-tuning, or post-processing.
arXiv Detail & Related papers (2021-04-18T16:56:07Z) - Text to Image Generation with Semantic-Spatial Aware GAN [41.73685713621705]
A text to image generation (T2I) model aims to generate photo-realistic images which are semantically consistent with the text descriptions.
We propose a novel framework Semantic-Spatial Aware GAN, which is trained in an end-to-end fashion so that the text encoder can exploit better text information.
arXiv Detail & Related papers (2021-04-01T15:48:01Z) - TediGAN: Text-Guided Diverse Face Image Generation and Manipulation [52.83401421019309]
TediGAN is a framework for multi-modal image generation and manipulation with textual descriptions.
StyleGAN inversion module maps real images to the latent space of a well-trained StyleGAN.
visual-linguistic similarity learns the text-image matching by mapping the image and text into a common embedding space.
instance-level optimization is for identity preservation in manipulation.
arXiv Detail & Related papers (2020-12-06T16:20:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.