Recurrent Affine Transformation for Text-to-image Synthesis
- URL: http://arxiv.org/abs/2204.10482v1
- Date: Fri, 22 Apr 2022 03:49:47 GMT
- Title: Recurrent Affine Transformation for Text-to-image Synthesis
- Authors: Senmao Ye, Fei Liu, Minkui Tan
- Abstract summary: Existing methods usually adaptively fuse suitable text information into the synthesis process with isolated fusion blocks.
We propose a Recurrent Affine Transformation (RAT) for Generative Adrial Networks that connects all the fusion blocks with a recurrent neural network to model their long-term dependency.
Being aware of matching image regions, text descriptions supervise the generator to synthesize more relevant image contents.
- Score: 5.256132101498471
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-image synthesis aims to generate natural images conditioned on text
descriptions. The main difficulty of this task lies in effectively fusing text
information into the image synthesis process. Existing methods usually
adaptively fuse suitable text information into the synthesis process with
multiple isolated fusion blocks (e.g., Conditional
Batch Normalization and Instance Normalization). However, isolated fusion
blocks not only conflict with each other but also increase the difficulty of
training (see first page of the supplementary). To address these issues, we
propose a Recurrent Affine Transformation (RAT) for Generative Adversarial
Networks that connects all the fusion blocks with a recurrent neural network to
model their long-term dependency. Besides, to improve semantic consistency
between texts and synthesized images, we incorporate a spatial attention model
in the discriminator. Being aware of matching image regions, text descriptions
supervise the generator to synthesize more relevant image contents. Extensive
experiments on the CUB, Oxford-102 and COCO datasets demonstrate the
superiority of the proposed model in comparison to state-of-the-art models
\footnote{https://github.com/senmaoy/Recurrent-Affine-Transformation-for-Text-to-image-Synthesis.git}
Related papers
- Style Generation: Image Synthesis based on Coarsely Matched Texts [10.939482612568433]
We introduce a novel task called text-based style generation and propose a two-stage generative adversarial network.
The first stage generates the overall image style with a sentence feature, and the second stage refines the generated style with a synthetic feature.
The practical potential of our work is demonstrated by various applications such as text-image alignment and story visualization.
arXiv Detail & Related papers (2023-09-08T21:51:11Z) - Fine-grained Cross-modal Fusion based Refinement for Text-to-Image
Synthesis [12.954663420736782]
We propose a novel Fine-grained text-image Fusion based Generative Adversarial Networks, dubbed FF-GAN.
The FF-GAN consists of two modules: Fine-grained text-image Fusion Block (FF-Block) and Global Semantic Refinement (GSR)
arXiv Detail & Related papers (2023-02-17T05:44:05Z) - StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale
Text-to-Image Synthesis [54.39789900854696]
StyleGAN-T addresses the specific requirements of large-scale text-to-image synthesis.
It significantly improves over previous GANs and outperforms distilled diffusion models in terms of sample quality and speed.
arXiv Detail & Related papers (2023-01-23T16:05:45Z) - High-Fidelity Guided Image Synthesis with Latent Diffusion Models [50.39294302741698]
The proposed approach outperforms the previous state-of-the-art by over 85.32% on the overall user satisfaction scores.
Human user study results show that the proposed approach outperforms the previous state-of-the-art by over 85.32% on the overall user satisfaction scores.
arXiv Detail & Related papers (2022-11-30T15:43:20Z) - eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert
Denoisers [87.52504764677226]
Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis.
We train an ensemble of text-to-image diffusion models specialized for different stages synthesis.
Our ensemble of diffusion models, called eDiffi, results in improved text alignment while maintaining the same inference cost.
arXiv Detail & Related papers (2022-11-02T17:43:04Z) - DAE-GAN: Dynamic Aspect-aware GAN for Text-to-Image Synthesis [55.788772366325105]
We propose a Dynamic Aspect-awarE GAN (DAE-GAN) that represents text information comprehensively from multiple granularities, including sentence-level, word-level, and aspect-level.
Inspired by human learning behaviors, we develop a novel Aspect-aware Dynamic Re-drawer (ADR) for image refinement, in which an Attended Global Refinement (AGR) module and an Aspect-aware Local Refinement (ALR) module are alternately employed.
arXiv Detail & Related papers (2021-08-27T07:20:34Z) - ImageBART: Bidirectional Context with Multinomial Diffusion for
Autoregressive Image Synthesis [15.006676130258372]
Autoregressive models incorporate context in a linear 1D order by attending only to previously synthesized image patches above or to the left.
We propose a coarse-to-fine hierarchy of context by combining the autoregressive formulation with a multinomial diffusion process.
Our approach can take unrestricted, user-provided masks into account to perform local image editing.
arXiv Detail & Related papers (2021-08-19T17:50:07Z) - Cycle-Consistent Inverse GAN for Text-to-Image Synthesis [101.97397967958722]
We propose a novel unified framework of Cycle-consistent Inverse GAN for both text-to-image generation and text-guided image manipulation tasks.
We learn a GAN inversion model to convert the images back to the GAN latent space and obtain the inverted latent codes for each image.
In the text-guided optimization module, we generate images with the desired semantic attributes by optimizing the inverted latent codes.
arXiv Detail & Related papers (2021-08-03T08:38:16Z) - DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis [80.54273334640285]
We propose a novel one-stage text-to-image backbone that directly synthesizes high-resolution images without entanglements between different generators.
We also propose a novel Target-Aware Discriminator composed of Matching-Aware Gradient Penalty and One-Way Output.
Compared with current state-of-the-art methods, our proposed DF-GAN is simpler but more efficient to synthesize realistic and text-matching images.
arXiv Detail & Related papers (2020-08-13T12:51:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.