InvGAN: Invertible GANs
- URL: http://arxiv.org/abs/2112.04598v2
- Date: Fri, 10 Dec 2021 14:13:38 GMT
- Title: InvGAN: Invertible GANs
- Authors: Partha Ghosh, Dominik Zietlow, Michael J. Black, Larry S. Davis,
Xiaochen Hu
- Abstract summary: InvGAN, short for Invertible GAN, successfully embeds real images to the latent space of a high quality generative model.
This allows us to perform image inpainting, merging, and online data augmentation.
- Score: 88.58338626299837
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generation of photo-realistic images, semantic editing and representation
learning are a few of many potential applications of high resolution generative
models. Recent progress in GANs have established them as an excellent choice
for such tasks. However, since they do not provide an inference model, image
editing or downstream tasks such as classification can not be done on real
images using the GAN latent space. Despite numerous efforts to train an
inference model or design an iterative method to invert a pre-trained
generator, previous methods are dataset (e.g. human face images) and
architecture (e.g. StyleGAN) specific. These methods are nontrivial to extend
to novel datasets or architectures. We propose a general framework that is
agnostic to architecture and datasets. Our key insight is that, by training the
inference and the generative model together, we allow them to adapt to each
other and to converge to a better quality model. Our \textbf{InvGAN}, short for
Invertible GAN, successfully embeds real images to the latent space of a high
quality generative model. This allows us to perform image inpainting, merging,
interpolation and online data augmentation. We demonstrate this with extensive
qualitative and quantitative experiments.
Related papers
- Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models [36.59260354292177]
Recent advancements in text-to-image generation have inspired researchers to generate datasets tailored for perception models using generative models.
We aim to fine-tune vision-language models to a specific classification model without access to any real images.
Despite the high fidelity of generated images, we observed a significant performance degradation when fine-tuning the model using the generated datasets.
arXiv Detail & Related papers (2024-06-08T10:43:49Z) - Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images.
Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z) - Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL)
Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images.
Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z) - LayoutDiffuse: Adapting Foundational Diffusion Models for
Layout-to-Image Generation [24.694298869398033]
Our method trains efficiently, generates images with both high perceptual quality and layout alignment.
Our method significantly outperforms other 10 generative models based on GANs, VQ-VAE, and diffusion models.
arXiv Detail & Related papers (2023-02-16T14:20:25Z) - Lafite2: Few-shot Text-to-Image Generation [132.14211027057766]
We propose a novel method for pre-training text-to-image generation model on image-only datasets.
It considers a retrieval-then-optimization procedure to synthesize pseudo text features.
It can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning.
arXiv Detail & Related papers (2022-10-25T16:22:23Z) - Implementing and Experimenting with Diffusion Models for Text-to-Image
Generation [0.0]
Two models, DALL-E 2 and Imagen, have demonstrated that highly photorealistic images could be generated from a simple textual description of an image.
Text-to-image models require exceptionally large amounts of computational resources to train, as well as handling huge datasets collected from the internet.
This thesis contributes by reviewing the different approaches and techniques used by these models, and then by proposing our own implementation of a text-to-image model.
arXiv Detail & Related papers (2022-09-22T12:03:33Z) - A Shared Representation for Photorealistic Driving Simulators [83.5985178314263]
We propose to improve the quality of generated images by rethinking the discriminator architecture.
The focus is on the class of problems where images are generated given semantic inputs, such as scene segmentation maps or human body poses.
We aim to learn a shared latent representation that encodes enough information to jointly do semantic segmentation, content reconstruction, along with a coarse-to-fine grained adversarial reasoning.
arXiv Detail & Related papers (2021-12-09T18:59:21Z) - A Generic Approach for Enhancing GANs by Regularized Latent Optimization [79.00740660219256]
We introduce a generic framework called em generative-model inference that is capable of enhancing pre-trained GANs effectively and seamlessly.
Our basic idea is to efficiently infer the optimal latent distribution for the given requirements using Wasserstein gradient flow techniques.
arXiv Detail & Related papers (2021-12-07T05:22:50Z) - Meta Internal Learning [88.68276505511922]
Internal learning for single-image generation is a framework, where a generator is trained to produce novel images based on a single image.
We propose a meta-learning approach that enables training over a collection of images, in order to model the internal statistics of the sample image more effectively.
Our results show that the models obtained are as suitable as single-image GANs for many common image applications.
arXiv Detail & Related papers (2021-10-06T16:27:38Z) - StackGAN: Facial Image Generation Optimizations [0.0]
Current state-of-the-art photorealistic generators are computationally expensive, involve unstable training processes, and have real and synthetic distributions that are dissimilar in higher-dimensional spaces.
We propose a variant of the StackGAN architecture, which incorporates conditional generators to construct an image in many stages.
Our model is trained with the CelebA facial image dataset and achieved a Fr'echet Inception Distance (FID) score of 73 for edge images and a score of 59 for grayscale images generated using the synthetic edge images.
arXiv Detail & Related papers (2021-08-30T15:04:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.