Learning Layout and Style Reconfigurable GANs for Controllable Image
Synthesis
- URL: http://arxiv.org/abs/2003.11571v2
- Date: Fri, 26 Mar 2021 19:57:02 GMT
- Title: Learning Layout and Style Reconfigurable GANs for Controllable Image
Synthesis
- Authors: Wei Sun and Tianfu Wu
- Abstract summary: This paper focuses on a recent emerged task, layout-to-image, to learn generative models capable of synthesizing photo-realistic images from spatial layout.
Style control at the image level is the same as in vanilla GANs, while style control at the object mask level is realized by a proposed novel feature normalization scheme.
In experiments, the proposed method is tested in the COCO-Stuff dataset and the Visual Genome dataset with state-of-the-art performance obtained.
- Score: 12.449076001538552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the remarkable recent progress on learning deep generative models, it
becomes increasingly interesting to develop models for controllable image
synthesis from reconfigurable inputs. This paper focuses on a recent emerged
task, layout-to-image, to learn generative models that are capable of
synthesizing photo-realistic images from spatial layout (i.e., object bounding
boxes configured in an image lattice) and style (i.e., structural and
appearance variations encoded by latent vectors). This paper first proposes an
intuitive paradigm for the task, layout-to-mask-to-image, to learn to unfold
object masks of given bounding boxes in an input layout to bridge the gap
between the input layout and synthesized images. Then, this paper presents a
method built on Generative Adversarial Networks for the proposed
layout-to-mask-to-image with style control at both image and mask levels.
Object masks are learned from the input layout and iteratively refined along
stages in the generator network. Style control at the image level is the same
as in vanilla GANs, while style control at the object mask level is realized by
a proposed novel feature normalization scheme, Instance-Sensitive and
Layout-Aware Normalization. In experiments, the proposed method is tested in
the COCO-Stuff dataset and the Visual Genome dataset with state-of-the-art
performance obtained.
Related papers
- Automatic Generation of Semantic Parts for Face Image Synthesis [7.728916126705043]
We describe a network architecture to address the problem of automatically manipulating or generating the shape of object classes in semantic segmentation masks.
Our proposed model allows embedding the mask class-wise into a latent space where each class embedding can be independently edited.
We report quantitative and qualitative results on the Celeb-MaskHQ dataset, which show our model can both faithfully reconstruct and modify a segmentation mask at the class level.
arXiv Detail & Related papers (2023-07-11T15:01:42Z) - Not All Image Regions Matter: Masked Vector Quantization for
Autoregressive Image Generation [78.13793505707952]
Existing autoregressive models follow the two-stage generation paradigm that first learns a codebook in the latent space for image reconstruction and then completes the image generation autoregressively based on the learned codebook.
We propose a novel two-stage framework, which consists of Masked Quantization VAE (MQ-VAE) Stack model from modeling redundancy.
arXiv Detail & Related papers (2023-05-23T02:15:53Z) - Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data.
We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process.
In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z) - StrucTexTv2: Masked Visual-Textual Prediction for Document Image
Pre-training [64.37272287179661]
StrucTexTv2 is an effective document image pre-training framework.
It consists of two self-supervised pre-training tasks: masked image modeling and masked language modeling.
It achieves competitive or even new state-of-the-art performance in various downstream tasks such as image classification, layout analysis, table structure recognition, document OCR, and information extraction.
arXiv Detail & Related papers (2023-03-01T07:32:51Z) - MaskSketch: Unpaired Structure-guided Masked Image Generation [56.88038469743742]
MaskSketch is an image generation method that allows spatial conditioning of the generation result using a guiding sketch as an extra conditioning signal during sampling.
We show that intermediate self-attention maps of a masked generative transformer encode important structural information of the input image.
Our results show that MaskSketch achieves high image realism and fidelity to the guiding structure.
arXiv Detail & Related papers (2023-02-10T20:27:02Z) - CoGS: Controllable Generation and Search from Sketch and Style [35.625940819995996]
We present CoGS, a method for the style-conditioned, sketch-driven synthesis of images.
CoGS enables exploration of diverse appearance possibilities for a given sketched object.
We show that our model, trained on the 125 object classes of our newly created Pseudosketches dataset, is capable of producing a diverse gamut of semantic content and appearance styles.
arXiv Detail & Related papers (2022-03-17T18:36:11Z) - Interactive Image Synthesis with Panoptic Layout Generation [14.1026819862002]
We propose Panoptic Layout Generative Adversarial Networks (PLGAN) to address this challenge.
PLGAN employs panoptic theory which distinguishes object categories between "stuff" with amorphous boundaries and "things" with well-defined shapes.
We experimentally compare our PLGAN with state-of-the-art layout-based models on the COCO-Stuff, Visual Genome, and Landscape datasets.
arXiv Detail & Related papers (2022-03-04T02:45:27Z) - SketchEdit: Mask-Free Local Image Manipulation with Partial Sketches [95.45728042499836]
We propose a new paradigm of sketch-based image manipulation: mask-free local image manipulation.
Our model automatically predicts the target modification region and encodes it into a structure style vector.
A generator then synthesizes the new image content based on the style vector and sketch.
arXiv Detail & Related papers (2021-11-30T02:42:31Z) - Improving Visual Quality of Image Synthesis by A Token-based Generator
with Transformers [51.581926074686535]
We present a new perspective of achieving image synthesis by viewing this task as a visual token generation problem.
The proposed TokenGAN has achieved state-of-the-art results on several widely-used image synthesis benchmarks.
arXiv Detail & Related papers (2021-11-05T12:57:50Z) - Few-shot Semantic Image Synthesis Using StyleGAN Prior [8.528384027684192]
We present a training strategy that performs pseudo labeling of semantic masks using the StyleGAN prior.
Our key idea is to construct a simple mapping between the StyleGAN feature and each semantic class from a few examples of semantic masks.
Although the pseudo semantic masks might be too coarse for previous approaches that require pixel-aligned masks, our framework can synthesize high-quality images from not only dense semantic masks but also sparse inputs such as landmarks and scribbles.
arXiv Detail & Related papers (2021-03-27T11:04:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.