SegAttnGAN: Text to Image Generation with Segmentation Attention
- URL: http://arxiv.org/abs/2005.12444v1
- Date: Mon, 25 May 2020 23:56:41 GMT
- Title: SegAttnGAN: Text to Image Generation with Segmentation Attention
- Authors: Yuchuan Gou, Qiancheng Wu, Minghao Li, Bo Gong, Mei Han
- Abstract summary: We propose a novel generative network (SegAttnGAN) that utilizes additional segmentation information for the text-to-image synthesis task.
As the segmentation data introduced to the model provides useful guidance on the generator training, the proposed model can generate images with better realism quality.
- Score: 6.561007033994183
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a novel generative network (SegAttnGAN) that
utilizes additional segmentation information for the text-to-image synthesis
task. As the segmentation data introduced to the model provides useful guidance
on the generator training, the proposed model can generate images with better
realism quality and higher quantitative measures compared with the previous
state-of-art methods. We achieved Inception Score of 4.84 on the CUB dataset
and 3.52 on the Oxford-102 dataset. Besides, we tested the self-attention
SegAttnGAN which uses generated segmentation data instead of masks from
datasets for attention and achieved similar high-quality results, suggesting
that our model can be adapted for the text-to-image synthesis task.
Related papers
- Enhanced Generative Data Augmentation for Semantic Segmentation via Stronger Guidance [1.2923961938782627]
We introduce an effective data augmentation method for semantic segmentation using the Controllable Diffusion Model.
Our proposed method includes efficient prompt generation using Class-Prompt Appending and Visual Prior Combination.
We evaluate our method on the PASCAL VOC datasets and found it highly effective for synthesizing images in semantic segmentation.
arXiv Detail & Related papers (2024-09-09T19:01:14Z) - Learning from Models and Data for Visual Grounding [55.21937116752679]
We introduce SynGround, a framework that combines data-driven learning and knowledge transfer from various large-scale pretrained models.
We finetune a pretrained vision-and-language model on this dataset by optimizing a mask-attention objective.
The resulting model improves the grounding capabilities of an off-the-shelf vision-and-language model.
arXiv Detail & Related papers (2024-03-20T17:59:43Z) - Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for
Pixel-Level Semantic Segmentation [6.82236459614491]
We propose a novel method for generating pixel-level semantic segmentation labels using the text-to-image generative model Stable Diffusion.
By utilizing the text prompts, cross-attention, and self-attention of SD, we introduce three new techniques: class-prompt appending, class-prompt cross-attention, and self-attention exponentiation.
These techniques enable us to generate segmentation maps corresponding to synthetic images.
arXiv Detail & Related papers (2023-09-25T17:19:26Z) - DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion
Models [61.906934570771256]
We present a generic dataset generation model that can produce diverse synthetic images and perception annotations.
Our method builds upon the pre-trained diffusion model and extends text-guided image synthesis to perception data generation.
We show that the rich latent code of the diffusion model can be effectively decoded as accurate perception annotations using a decoder module.
arXiv Detail & Related papers (2023-08-11T14:38:11Z) - Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR)
It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z) - Lafite2: Few-shot Text-to-Image Generation [132.14211027057766]
We propose a novel method for pre-training text-to-image generation model on image-only datasets.
It considers a retrieval-then-optimization procedure to synthesize pseudo text features.
It can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning.
arXiv Detail & Related papers (2022-10-25T16:22:23Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - Descriptive Modeling of Textiles using FE Simulations and Deep Learning [0.0]
We propose a novel and fully automated method for extracting the yarn geometrical features in woven composites.
The proposed approach employs two deep neural network architectures (U-Net and Mask RCNN)
Experimental results show that our method is accurate and robust for performing yarn instance segmentation on CT images.
arXiv Detail & Related papers (2021-06-26T09:32:24Z) - CAGAN: Text-To-Image Generation with Combined Attention GANs [70.3497683558609]
We propose the Combined Attention Generative Adversarial Network (CAGAN) to generate photo-realistic images according to textual descriptions.
The proposed CAGAN uses two attention models: word attention to draw different sub-regions conditioned on related words; and squeeze-and-excitation attention to capture non-linear interaction among channels.
With spectral normalisation to stabilise training, our proposed CAGAN improves the state of the art on the IS and FID on the CUB dataset and the FID on the more challenging COCO dataset.
arXiv Detail & Related papers (2021-04-26T15:46:40Z) - Improving Augmentation and Evaluation Schemes for Semantic Image
Synthesis [16.097324852253912]
We introduce a novel augmentation scheme designed specifically for generative adversarial networks (GANs)
We propose to randomly warp object shapes in the semantic label maps used as an input to the generator.
The local shape discrepancies between the warped and non-warped label maps and images enable the GAN to learn better the structural and geometric details of the scene.
arXiv Detail & Related papers (2020-11-25T10:55:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.