Semantic Image Synthesis with Unconditional Generator
- URL: http://arxiv.org/abs/2402.14395v1
- Date: Thu, 22 Feb 2024 09:10:28 GMT
- Title: Semantic Image Synthesis with Unconditional Generator
- Authors: Jungwoo Chae, Hyunin Cho, Sooyeon Go, Kyungmook Choi, Youngjung Uh
- Abstract summary: We propose to employ a pre-trained unconditional generator and rearrange its feature maps according to proxy masks.
The proxy masks are prepared from the feature maps of random samples in the generator by simple clustering.
Our method is versatile across various applications such as free-form spatial editing of real images, sketch-to-photo, and even scribble-to-photo.
- Score: 8.65146533481257
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Semantic image synthesis (SIS) aims to generate realistic images that match
given semantic masks. Despite recent advances allowing high-quality results and
precise spatial control, they require a massive semantic segmentation dataset
for training the models. Instead, we propose to employ a pre-trained
unconditional generator and rearrange its feature maps according to proxy
masks. The proxy masks are prepared from the feature maps of random samples in
the generator by simple clustering. The feature rearranger learns to rearrange
original feature maps to match the shape of the proxy masks that are either
from the original sample itself or from random samples. Then we introduce a
semantic mapper that produces the proxy masks from various input conditions
including semantic masks. Our method is versatile across various applications
such as free-form spatial editing of real images, sketch-to-photo, and even
scribble-to-photo. Experiments validate advantages of our method on a range of
datasets: human faces, animal faces, and buildings.
Related papers
- MaskInversion: Localized Embeddings via Optimization of Explainability Maps [49.50785637749757]
MaskInversion generates a context-aware embedding for a query image region specified by a mask at test time.
It can be used for a broad range of tasks, including open-vocabulary class retrieval, referring expression comprehension, as well as for localized captioning and image generation.
arXiv Detail & Related papers (2024-07-29T14:21:07Z) - Pre-training with Random Orthogonal Projection Image Modeling [32.667183132025094]
Masked Image Modeling (MIM) is a powerful self-supervised strategy for visual pre-training without the use of labels.
We propose an Image Modeling framework based on Random Orthogonal Projection Image Modeling (ROPIM)
ROPIM reduces spatially-wise token information under guaranteed bound on the noise variance and can be considered as masking entire spatial image area under locally varying masking degrees.
arXiv Detail & Related papers (2023-10-28T15:42:07Z) - Automatic Generation of Semantic Parts for Face Image Synthesis [7.728916126705043]
We describe a network architecture to address the problem of automatically manipulating or generating the shape of object classes in semantic segmentation masks.
Our proposed model allows embedding the mask class-wise into a latent space where each class embedding can be independently edited.
We report quantitative and qualitative results on the Celeb-MaskHQ dataset, which show our model can both faithfully reconstruct and modify a segmentation mask at the class level.
arXiv Detail & Related papers (2023-07-11T15:01:42Z) - Not All Image Regions Matter: Masked Vector Quantization for
Autoregressive Image Generation [78.13793505707952]
Existing autoregressive models follow the two-stage generation paradigm that first learns a codebook in the latent space for image reconstruction and then completes the image generation autoregressively based on the learned codebook.
We propose a novel two-stage framework, which consists of Masked Quantization VAE (MQ-VAE) Stack model from modeling redundancy.
arXiv Detail & Related papers (2023-05-23T02:15:53Z) - DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic
Segmentation Using Diffusion Models [68.21154597227165]
We show that it is possible to automatically obtain accurate semantic masks of synthetic images generated by the Off-the-shelf Stable Diffusion model.
Our approach, called DiffuMask, exploits the potential of the cross-attention map between text and image.
arXiv Detail & Related papers (2023-03-21T08:43:15Z) - Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data.
We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process.
In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z) - MaskSketch: Unpaired Structure-guided Masked Image Generation [56.88038469743742]
MaskSketch is an image generation method that allows spatial conditioning of the generation result using a guiding sketch as an extra conditioning signal during sampling.
We show that intermediate self-attention maps of a masked generative transformer encode important structural information of the input image.
Our results show that MaskSketch achieves high image realism and fidelity to the guiding structure.
arXiv Detail & Related papers (2023-02-10T20:27:02Z) - Semantic-guided Multi-Mask Image Harmonization [10.27974860479791]
We propose a new semantic-guided multi-mask image harmonization task.
In this work, we propose a novel way to edit the inharmonious images by predicting a series of operator masks.
arXiv Detail & Related papers (2022-07-24T11:48:49Z) - SketchEdit: Mask-Free Local Image Manipulation with Partial Sketches [95.45728042499836]
We propose a new paradigm of sketch-based image manipulation: mask-free local image manipulation.
Our model automatically predicts the target modification region and encodes it into a structure style vector.
A generator then synthesizes the new image content based on the style vector and sketch.
arXiv Detail & Related papers (2021-11-30T02:42:31Z) - Few-shot Semantic Image Synthesis Using StyleGAN Prior [8.528384027684192]
We present a training strategy that performs pseudo labeling of semantic masks using the StyleGAN prior.
Our key idea is to construct a simple mapping between the StyleGAN feature and each semantic class from a few examples of semantic masks.
Although the pseudo semantic masks might be too coarse for previous approaches that require pixel-aligned masks, our framework can synthesize high-quality images from not only dense semantic masks but also sparse inputs such as landmarks and scribbles.
arXiv Detail & Related papers (2021-03-27T11:04:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.