MaskSketch: Unpaired Structure-guided Masked Image Generation
- URL: http://arxiv.org/abs/2302.05496v1
- Date: Fri, 10 Feb 2023 20:27:02 GMT
- Title: MaskSketch: Unpaired Structure-guided Masked Image Generation
- Authors: Dina Bashkirova, Jose Lezama, Kihyuk Sohn, Kate Saenko and Irfan Essa
- Abstract summary: MaskSketch is an image generation method that allows spatial conditioning of the generation result using a guiding sketch as an extra conditioning signal during sampling.
We show that intermediate self-attention maps of a masked generative transformer encode important structural information of the input image.
Our results show that MaskSketch achieves high image realism and fidelity to the guiding structure.
- Score: 56.88038469743742
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent conditional image generation methods produce images of remarkable
diversity, fidelity and realism. However, the majority of these methods allow
conditioning only on labels or text prompts, which limits their level of
control over the generation result. In this paper, we introduce MaskSketch, an
image generation method that allows spatial conditioning of the generation
result using a guiding sketch as an extra conditioning signal during sampling.
MaskSketch utilizes a pre-trained masked generative transformer, requiring no
model training or paired supervision, and works with input sketches of
different levels of abstraction. We show that intermediate self-attention maps
of a masked generative transformer encode important structural information of
the input image, such as scene layout and object shape, and we propose a novel
sampling method based on this observation to enable structure-guided
generation. Our results show that MaskSketch achieves high image realism and
fidelity to the guiding structure. Evaluated on standard benchmark datasets,
MaskSketch outperforms state-of-the-art methods for sketch-to-image
translation, as well as unpaired image-to-image translation approaches.
Related papers
- Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data.
We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process.
In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z) - StrucTexTv2: Masked Visual-Textual Prediction for Document Image
Pre-training [64.37272287179661]
StrucTexTv2 is an effective document image pre-training framework.
It consists of two self-supervised pre-training tasks: masked image modeling and masked language modeling.
It achieves competitive or even new state-of-the-art performance in various downstream tasks such as image classification, layout analysis, table structure recognition, document OCR, and information extraction.
arXiv Detail & Related papers (2023-03-01T07:32:51Z) - MaskGIT: Masked Generative Image Transformer [49.074967597485475]
MaskGIT learns to predict randomly masked tokens by attending to tokens in all directions.
Experiments demonstrate that MaskGIT significantly outperforms the state-of-the-art transformer model on the ImageNet dataset.
arXiv Detail & Related papers (2022-02-08T23:54:06Z) - Image Generation with Self Pixel-wise Normalization [17.147675335268282]
Region-adaptive normalization (RAN) methods have been widely used in the generative adversarial network (GAN)-based image-to-image translation technique.
This paper presents a novel normalization method, called self pixel-wise normalization (SPN), which effectively boosts the generative performance by performing the pixel-adaptive affine transformation without the mask image.
arXiv Detail & Related papers (2022-01-26T03:14:31Z) - GANSeg: Learning to Segment by Unsupervised Hierarchical Image
Generation [16.900404701997502]
We propose a GAN-based approach that generates images conditioned on latent masks.
We show that such mask-conditioned image generation can be learned faithfully when conditioning the masks in a hierarchical manner.
It also lets us generate image-mask pairs for training a segmentation network, which outperforms the state-of-the-art unsupervised segmentation methods on established benchmarks.
arXiv Detail & Related papers (2021-12-02T07:57:56Z) - SketchEdit: Mask-Free Local Image Manipulation with Partial Sketches [95.45728042499836]
We propose a new paradigm of sketch-based image manipulation: mask-free local image manipulation.
Our model automatically predicts the target modification region and encodes it into a structure style vector.
A generator then synthesizes the new image content based on the style vector and sketch.
arXiv Detail & Related papers (2021-11-30T02:42:31Z) - Image Inpainting with Edge-guided Learnable Bidirectional Attention Maps [85.67745220834718]
We present an edge-guided learnable bidirectional attention map (Edge-LBAM) for improving image inpainting of irregular holes.
Our Edge-LBAM method contains dual procedures,including structure-aware mask-updating guided by predict edges.
Extensive experiments show that our Edge-LBAM is effective in generating coherent image structures and preventing color discrepancy and blurriness.
arXiv Detail & Related papers (2021-04-25T07:25:16Z) - Learning Layout and Style Reconfigurable GANs for Controllable Image
Synthesis [12.449076001538552]
This paper focuses on a recent emerged task, layout-to-image, to learn generative models capable of synthesizing photo-realistic images from spatial layout.
Style control at the image level is the same as in vanilla GANs, while style control at the object mask level is realized by a proposed novel feature normalization scheme.
In experiments, the proposed method is tested in the COCO-Stuff dataset and the Visual Genome dataset with state-of-the-art performance obtained.
arXiv Detail & Related papers (2020-03-25T18:16:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.