Automatic Generation of Semantic Parts for Face Image Synthesis
- URL: http://arxiv.org/abs/2307.05317v1
- Date: Tue, 11 Jul 2023 15:01:42 GMT
- Title: Automatic Generation of Semantic Parts for Face Image Synthesis
- Authors: Tomaso Fontanini, Claudio Ferrari, Massimo Bertozzi, Andrea Prati
- Abstract summary: We describe a network architecture to address the problem of automatically manipulating or generating the shape of object classes in semantic segmentation masks.
Our proposed model allows embedding the mask class-wise into a latent space where each class embedding can be independently edited.
We report quantitative and qualitative results on the Celeb-MaskHQ dataset, which show our model can both faithfully reconstruct and modify a segmentation mask at the class level.
- Score: 7.728916126705043
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Semantic image synthesis (SIS) refers to the problem of generating realistic
imagery given a semantic segmentation mask that defines the spatial layout of
object classes. Most of the approaches in the literature, other than the
quality of the generated images, put effort in finding solutions to increase
the generation diversity in terms of style i.e. texture. However, they all
neglect a different feature, which is the possibility of manipulating the
layout provided by the mask. Currently, the only way to do so is manually by
means of graphical users interfaces. In this paper, we describe a network
architecture to address the problem of automatically manipulating or generating
the shape of object classes in semantic segmentation masks, with specific focus
on human faces. Our proposed model allows embedding the mask class-wise into a
latent space where each class embedding can be independently edited. Then, a
bi-directional LSTM block and a convolutional decoder output a new, locally
manipulated mask. We report quantitative and qualitative results on the
CelebMask-HQ dataset, which show our model can both faithfully reconstruct and
modify a segmentation mask at the class level. Also, we show our model can be
put before a SIS generator, opening the way to a fully automatic generation
control of both shape and texture. Code available at
https://github.com/TFonta/Semantic-VAE.
Related papers
- MaskInversion: Localized Embeddings via Optimization of Explainability Maps [49.50785637749757]
MaskInversion generates a context-aware embedding for a query image region specified by a mask at test time.
It can be used for a broad range of tasks, including open-vocabulary class retrieval, referring expression comprehension, as well as for localized captioning and image generation.
arXiv Detail & Related papers (2024-07-29T14:21:07Z) - Semantic Image Synthesis with Unconditional Generator [8.65146533481257]
We propose to employ a pre-trained unconditional generator and rearrange its feature maps according to proxy masks.
The proxy masks are prepared from the feature maps of random samples in the generator by simple clustering.
Our method is versatile across various applications such as free-form spatial editing of real images, sketch-to-photo, and even scribble-to-photo.
arXiv Detail & Related papers (2024-02-22T09:10:28Z) - DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic
Segmentation Using Diffusion Models [68.21154597227165]
We show that it is possible to automatically obtain accurate semantic masks of synthetic images generated by the Off-the-shelf Stable Diffusion model.
Our approach, called DiffuMask, exploits the potential of the cross-attention map between text and image.
arXiv Detail & Related papers (2023-03-21T08:43:15Z) - Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data.
We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process.
In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z) - Towards Improved Input Masking for Convolutional Neural Networks [66.99060157800403]
We propose a new masking method for CNNs we call layer masking.
We show that our method is able to eliminate or minimize the influence of the mask shape or color on the output of the model.
We also demonstrate how the shape of the mask may leak information about the class, thus affecting estimates of model reliance on class-relevant features.
arXiv Detail & Related papers (2022-11-26T19:31:49Z) - Stare at What You See: Masked Image Modeling without Reconstruction [154.74533119863864]
Masked Autoencoders (MAE) have been prevailing paradigms for large-scale vision representation pre-training.
Recent approaches apply semantic-rich teacher models to extract image features as the reconstruction target, leading to better performance.
We argue the features extracted by powerful teacher models already encode rich semantic correlation across regions in an intact image.
arXiv Detail & Related papers (2022-11-16T12:48:52Z) - Open-Vocabulary Instance Segmentation via Robust Cross-Modal
Pseudo-Labeling [61.03262873980619]
Open-vocabulary instance segmentation aims at segmenting novel classes without mask annotations.
We propose a cross-modal pseudo-labeling framework, which generates training pseudo masks by aligning word semantics in captions with visual features of object masks in images.
Our framework is capable of labeling novel classes in captions via their word semantics to self-train a student model.
arXiv Detail & Related papers (2021-11-24T18:50:47Z) - Few-shot Semantic Image Synthesis Using StyleGAN Prior [8.528384027684192]
We present a training strategy that performs pseudo labeling of semantic masks using the StyleGAN prior.
Our key idea is to construct a simple mapping between the StyleGAN feature and each semantic class from a few examples of semantic masks.
Although the pseudo semantic masks might be too coarse for previous approaches that require pixel-aligned masks, our framework can synthesize high-quality images from not only dense semantic masks but also sparse inputs such as landmarks and scribbles.
arXiv Detail & Related papers (2021-03-27T11:04:22Z) - Learning Layout and Style Reconfigurable GANs for Controllable Image
Synthesis [12.449076001538552]
This paper focuses on a recent emerged task, layout-to-image, to learn generative models capable of synthesizing photo-realistic images from spatial layout.
Style control at the image level is the same as in vanilla GANs, while style control at the object mask level is realized by a proposed novel feature normalization scheme.
In experiments, the proposed method is tested in the COCO-Stuff dataset and the Visual Genome dataset with state-of-the-art performance obtained.
arXiv Detail & Related papers (2020-03-25T18:16:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.