Few-shot Semantic Image Synthesis Using StyleGAN Prior
- URL: http://arxiv.org/abs/2103.14877v1
- Date: Sat, 27 Mar 2021 11:04:22 GMT
- Title: Few-shot Semantic Image Synthesis Using StyleGAN Prior
- Authors: Yuki Endo and Yoshihiro Kanamori
- Abstract summary: We present a training strategy that performs pseudo labeling of semantic masks using the StyleGAN prior.
Our key idea is to construct a simple mapping between the StyleGAN feature and each semantic class from a few examples of semantic masks.
Although the pseudo semantic masks might be too coarse for previous approaches that require pixel-aligned masks, our framework can synthesize high-quality images from not only dense semantic masks but also sparse inputs such as landmarks and scribbles.
- Score: 8.528384027684192
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper tackles a challenging problem of generating photorealistic images
from semantic layouts in few-shot scenarios where annotated training pairs are
hardly available but pixel-wise annotation is quite costly. We present a
training strategy that performs pseudo labeling of semantic masks using the
StyleGAN prior. Our key idea is to construct a simple mapping between the
StyleGAN feature and each semantic class from a few examples of semantic masks.
With such mappings, we can generate an unlimited number of pseudo semantic
masks from random noise to train an encoder for controlling a pre-trained
StyleGAN generator. Although the pseudo semantic masks might be too coarse for
previous approaches that require pixel-aligned masks, our framework can
synthesize high-quality images from not only dense semantic masks but also
sparse inputs such as landmarks and scribbles. Qualitative and quantitative
results with various datasets demonstrate improvement over previous approaches
with respect to layout fidelity and visual quality in as few as one- or
five-shot settings.
Related papers
- Don't Look into the Dark: Latent Codes for Pluralistic Image Inpainting [8.572133295533643]
We present a method for large-mask pluralistic image inpainting based on the generative framework of discrete latent codes.
Our method learns latent priors, discretized as tokens, by only performing computations at the visible locations of the image.
arXiv Detail & Related papers (2024-03-27T01:28:36Z) - Semantic Image Synthesis with Unconditional Generator [8.65146533481257]
We propose to employ a pre-trained unconditional generator and rearrange its feature maps according to proxy masks.
The proxy masks are prepared from the feature maps of random samples in the generator by simple clustering.
Our method is versatile across various applications such as free-form spatial editing of real images, sketch-to-photo, and even scribble-to-photo.
arXiv Detail & Related papers (2024-02-22T09:10:28Z) - Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision [87.15580604023555]
Unpair-Seg is a novel weakly-supervised open-vocabulary segmentation framework.
It learns from unpaired image-mask and image-text pairs, which can be independently and efficiently collected.
It achieves 14.6% and 19.5% mIoU on the ADE-847 and PASCAL Context-459 datasets.
arXiv Detail & Related papers (2024-02-14T06:01:44Z) - Automatic Generation of Semantic Parts for Face Image Synthesis [7.728916126705043]
We describe a network architecture to address the problem of automatically manipulating or generating the shape of object classes in semantic segmentation masks.
Our proposed model allows embedding the mask class-wise into a latent space where each class embedding can be independently edited.
We report quantitative and qualitative results on the Celeb-MaskHQ dataset, which show our model can both faithfully reconstruct and modify a segmentation mask at the class level.
arXiv Detail & Related papers (2023-07-11T15:01:42Z) - Break-A-Scene: Extracting Multiple Concepts from a Single Image [80.47666266017207]
We introduce the task of textual scene decomposition.
We propose augmenting the input image with masks that indicate the presence of target concepts.
We then present a novel two-phase customization process.
arXiv Detail & Related papers (2023-05-25T17:59:04Z) - DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic
Segmentation Using Diffusion Models [68.21154597227165]
We show that it is possible to automatically obtain accurate semantic masks of synthetic images generated by the Off-the-shelf Stable Diffusion model.
Our approach, called DiffuMask, exploits the potential of the cross-attention map between text and image.
arXiv Detail & Related papers (2023-03-21T08:43:15Z) - Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data.
We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process.
In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z) - StrucTexTv2: Masked Visual-Textual Prediction for Document Image
Pre-training [64.37272287179661]
StrucTexTv2 is an effective document image pre-training framework.
It consists of two self-supervised pre-training tasks: masked image modeling and masked language modeling.
It achieves competitive or even new state-of-the-art performance in various downstream tasks such as image classification, layout analysis, table structure recognition, document OCR, and information extraction.
arXiv Detail & Related papers (2023-03-01T07:32:51Z) - Semantic-guided Multi-Mask Image Harmonization [10.27974860479791]
We propose a new semantic-guided multi-mask image harmonization task.
In this work, we propose a novel way to edit the inharmonious images by predicting a series of operator masks.
arXiv Detail & Related papers (2022-07-24T11:48:49Z) - Open-Vocabulary Instance Segmentation via Robust Cross-Modal
Pseudo-Labeling [61.03262873980619]
Open-vocabulary instance segmentation aims at segmenting novel classes without mask annotations.
We propose a cross-modal pseudo-labeling framework, which generates training pseudo masks by aligning word semantics in captions with visual features of object masks in images.
Our framework is capable of labeling novel classes in captions via their word semantics to self-train a student model.
arXiv Detail & Related papers (2021-11-24T18:50:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.