Text-to-Image Generation with Attention Based Recurrent Neural Networks
- URL: http://arxiv.org/abs/2001.06658v1
- Date: Sat, 18 Jan 2020 12:19:19 GMT
- Title: Text-to-Image Generation with Attention Based Recurrent Neural Networks
- Authors: Tehseen Zia, Shahan Arif, Shakeeb Murtaza, and Mirza Ahsan Ullah
- Abstract summary: We develop a tractable and stable caption-based image generation model.
Experimentations are performed on Microsoft datasets.
Results show that the proposed model performs better than contemporary approaches.
- Score: 1.2599533416395765
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conditional image modeling based on textual descriptions is a relatively new
domain in unsupervised learning. Previous approaches use a latent variable
model and generative adversarial networks. While the formers are approximated
by using variational auto-encoders and rely on the intractable inference that
can hamper their performance, the latter is unstable to train due to Nash
equilibrium based objective function. We develop a tractable and stable
caption-based image generation model. The model uses an attention-based encoder
to learn word-to-pixel dependencies. A conditional autoregressive based decoder
is used for learning pixel-to-pixel dependencies and generating images.
Experimentations are performed on Microsoft COCO, and MNIST-with-captions
datasets and performance is evaluated by using the Structural Similarity Index.
Results show that the proposed model performs better than contemporary
approaches and generate better quality images. Keywords: Generative image
modeling, autoregressive image modeling, caption-based image generation, neural
attention, recurrent neural networks.
Related papers
- Transformer-based Clipped Contrastive Quantization Learning for
Unsupervised Image Retrieval [15.982022297570108]
Unsupervised image retrieval aims to learn the important visual characteristics without any given level to retrieve the similar images for a given query image.
In this paper, we propose a TransClippedCLR model by encoding the global context of an image using Transformer having local context through patch based processing.
Results using the proposed clipped contrastive learning are greatly improved on all datasets as compared to same backbone network with vanilla contrastive learning.
arXiv Detail & Related papers (2024-01-27T09:39:11Z) - Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images.
Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Exploring Stochastic Autoregressive Image Modeling for Visual
Representation [24.582376834198403]
We propose a novel autoregressive image modeling (named SAIM) by the two simple designs.
By introducing prediction and the parallel encoder-decoder, SAIM significantly improve the performance of autoregressive image modeling.
Our method achieves the best accuracy (83.9%) on the vanilla ViT-Base model among methods using only ImageNet-1K data.
arXiv Detail & Related papers (2022-12-03T13:04:29Z) - Traditional Classification Neural Networks are Good Generators: They are
Competitive with DDPMs and GANs [104.72108627191041]
We show that conventional neural network classifiers can generate high-quality images comparable to state-of-the-art generative models.
We propose a mask-based reconstruction module to make semantic gradients-aware to synthesize plausible images.
We show that our method is also applicable to text-to-image generation by regarding image-text foundation models.
arXiv Detail & Related papers (2022-11-27T11:25:35Z) - Image Quality Assessment using Contrastive Learning [50.265638572116984]
We train a deep Convolutional Neural Network (CNN) using a contrastive pairwise objective to solve the auxiliary problem.
We show through extensive experiments that CONTRIQUE achieves competitive performance when compared to state-of-the-art NR image quality models.
Our results suggest that powerful quality representations with perceptual relevance can be obtained without requiring large labeled subjective image quality datasets.
arXiv Detail & Related papers (2021-10-25T21:01:00Z) - Meta Internal Learning [88.68276505511922]
Internal learning for single-image generation is a framework, where a generator is trained to produce novel images based on a single image.
We propose a meta-learning approach that enables training over a collection of images, in order to model the internal statistics of the sample image more effectively.
Our results show that the models obtained are as suitable as single-image GANs for many common image applications.
arXiv Detail & Related papers (2021-10-06T16:27:38Z) - Neural Knitworks: Patched Neural Implicit Representation Networks [1.0470286407954037]
We propose Knitwork, an architecture for neural implicit representation learning of natural images that achieves image synthesis.
To the best of our knowledge, this is the first implementation of a coordinate-based patch tailored for synthesis tasks such as image inpainting, super-resolution, and denoising.
The results show that modeling natural images using patches, rather than pixels, produces results of higher fidelity.
arXiv Detail & Related papers (2021-09-29T13:10:46Z) - Comparative evaluation of CNN architectures for Image Caption Generation [1.2183405753834562]
We have evaluated 17 different Convolutional Neural Networks on two popular Image Caption Generation frameworks.
We observe that model complexity of Convolutional Neural Network, as measured by number of parameters, and the accuracy of the model on Object Recognition task does not necessarily co-relate with its efficacy on feature extraction for Image Caption Generation task.
arXiv Detail & Related papers (2021-02-23T05:43:54Z) - Counterfactual Generative Networks [59.080843365828756]
We propose to decompose the image generation process into independent causal mechanisms that we train without direct supervision.
By exploiting appropriate inductive biases, these mechanisms disentangle object shape, object texture, and background.
We show that the counterfactual images can improve out-of-distribution with a marginal drop in performance on the original classification task.
arXiv Detail & Related papers (2021-01-15T10:23:12Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.