StackGAN: Facial Image Generation Optimizations
- URL: http://arxiv.org/abs/2108.13290v1
- Date: Mon, 30 Aug 2021 15:04:47 GMT
- Title: StackGAN: Facial Image Generation Optimizations
- Authors: Badr Belhiti, Justin Milushev, Avinash Gupta, John Breedis, Johnson
Dinh, Jesse Pisel, and Michael Pyrcz
- Abstract summary: Current state-of-the-art photorealistic generators are computationally expensive, involve unstable training processes, and have real and synthetic distributions that are dissimilar in higher-dimensional spaces.
We propose a variant of the StackGAN architecture, which incorporates conditional generators to construct an image in many stages.
Our model is trained with the CelebA facial image dataset and achieved a Fr'echet Inception Distance (FID) score of 73 for edge images and a score of 59 for grayscale images generated using the synthetic edge images.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current state-of-the-art photorealistic generators are computationally
expensive, involve unstable training processes, and have real and synthetic
distributions that are dissimilar in higher-dimensional spaces. To solve these
issues, we propose a variant of the StackGAN architecture. The new architecture
incorporates conditional generators to construct an image in many stages. In
our model, we generate grayscale facial images in two different stages: noise
to edges (stage one) and edges to grayscale (stage two). Our model is trained
with the CelebA facial image dataset and achieved a Fr\'echet Inception
Distance (FID) score of 73 for edge images and a score of 59 for grayscale
images generated using the synthetic edge images. Although our model achieved
subpar results in relation to state-of-the-art models, dropout layers could
reduce the overfitting in our conditional mapping. Additionally, since most
images can be broken down into important features, improvements to our model
can generalize to other datasets. Therefore, our model can potentially serve as
a superior alternative to traditional means of generating photorealistic
images.
Related papers
- Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images.
Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z) - WavePaint: Resource-efficient Token-mixer for Self-supervised Inpainting [2.3014300466616078]
This paper diverges from vision transformers by using a computationally-efficient WaveMix-based fully convolutional architecture -- WavePaint.
It uses a 2D-discrete wavelet transform (DWT) for spatial and multi-resolution token-mixing along with convolutional layers.
Our model even outperforms current GAN-based architectures in CelebA-HQ dataset without using an adversarially trainable discriminator.
arXiv Detail & Related papers (2023-07-01T18:41:34Z) - Shape, Pose, and Appearance from a Single Image via Bootstrapped
Radiance Field Inversion [54.151979979158085]
We introduce a principled end-to-end reconstruction framework for natural images, where accurate ground-truth poses are not available.
We leverage an unconditional 3D-aware generator, to which we apply a hybrid inversion scheme where a model produces a first guess of the solution.
Our framework can de-render an image in as few as 10 steps, enabling its use in practical scenarios.
arXiv Detail & Related papers (2022-11-21T17:42:42Z) - A Shared Representation for Photorealistic Driving Simulators [83.5985178314263]
We propose to improve the quality of generated images by rethinking the discriminator architecture.
The focus is on the class of problems where images are generated given semantic inputs, such as scene segmentation maps or human body poses.
We aim to learn a shared latent representation that encodes enough information to jointly do semantic segmentation, content reconstruction, along with a coarse-to-fine grained adversarial reasoning.
arXiv Detail & Related papers (2021-12-09T18:59:21Z) - InvGAN: Invertible GANs [88.58338626299837]
InvGAN, short for Invertible GAN, successfully embeds real images to the latent space of a high quality generative model.
This allows us to perform image inpainting, merging, and online data augmentation.
arXiv Detail & Related papers (2021-12-08T21:39:00Z) - Face sketch to photo translation using generative adversarial networks [1.0312968200748118]
We use a pre-trained face photo generating model to synthesize high-quality natural face photos.
We train a network to map the facial features extracted from the input sketch to a vector in the latent space of the face generating model.
The proposed model achieved 0.655 in the SSIM index and 97.59% rank-1 face recognition rate.
arXiv Detail & Related papers (2021-10-23T20:01:20Z) - Aggregated Contextual Transformations for High-Resolution Image
Inpainting [57.241749273816374]
We propose an enhanced GAN-based model, named Aggregated COntextual-Transformation GAN (AOT-GAN) for high-resolution image inpainting.
To enhance context reasoning, we construct the generator of AOT-GAN by stacking multiple layers of a proposed AOT block.
For improving texture synthesis, we enhance the discriminator of AOT-GAN by training it with a tailored mask-prediction task.
arXiv Detail & Related papers (2021-04-03T15:50:17Z) - OSTeC: One-Shot Texture Completion [86.23018402732748]
We propose an unsupervised approach for one-shot 3D facial texture completion.
The proposed approach rotates an input image in 3D and fill-in the unseen regions by reconstructing the rotated image in a 2D face generator.
We frontalize the target image by projecting the completed texture into the generator.
arXiv Detail & Related papers (2020-12-30T23:53:26Z) - Locally Masked Convolution for Autoregressive Models [107.4635841204146]
LMConv is a simple modification to the standard 2D convolution that allows arbitrary masks to be applied to the weights at each location in the image.
We learn an ensemble of distribution estimators that share parameters but differ in generation order, achieving improved performance on whole-image density estimation.
arXiv Detail & Related papers (2020-06-22T17:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.