Training End-to-end Single Image Generators without GANs
- URL: http://arxiv.org/abs/2004.06014v1
- Date: Tue, 7 Apr 2020 17:58:03 GMT
- Title: Training End-to-end Single Image Generators without GANs
- Authors: Yael Vinker and Nir Zabari and Yedid Hoshen
- Abstract summary: AugurOne is a novel approach for training single image generative models.
Our approach trains an upscaling neural network using non-affine augmentations of the (single) input image.
A compact latent space is jointly learned allowing for controlled image synthesis.
- Score: 27.393821783237186
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present AugurOne, a novel approach for training single image generative
models. Our approach trains an upscaling neural network using non-affine
augmentations of the (single) input image, particularly including non-rigid
thin plate spline image warps. The extensive augmentations significantly
increase the in-sample distribution for the upsampling network enabling the
upscaling of highly variable inputs. A compact latent space is jointly learned
allowing for controlled image synthesis. Differently from Single Image GAN, our
approach does not require GAN training and takes place in an end-to-end fashion
allowing fast and stable training. We experimentally evaluate our method and
show that it obtains compelling novel animations of single-image, as well as,
state-of-the-art performance on conditional generation tasks e.g.
paint-to-image and edges-to-image.
Related papers
- You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs [13.133574069588896]
YOSO is a novel generative model designed for rapid, scalable, and high-fidelity one-step image synthesis with high training stability and mode coverage.
We show that our method can serve as a one-step generation model training from scratch with competitive performance.
In particular, we show that the YOSO-PixArt-$alpha$ can generate images in one step trained on 512 resolution, with the capability of adapting to 1024 resolution without extra explicit training, requiring only 10 A800 days for fine-tuning.
arXiv Detail & Related papers (2024-03-19T17:34:27Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - CleftGAN: Adapting A Style-Based Generative Adversarial Network To
Create Images Depicting Cleft Lip Deformity [2.1647227058902336]
We have built a deep learning-based cleft lip generator designed to produce an almost unlimited number of artificial images exhibiting high-fidelity facsimiles of cleft lip.
We undertook a transfer learning protocol testing different versions of StyleGAN-ADA.
Training images depicting a variety of cleft deformities were pre-processed to adjust for rotation, scaling, color adjustment and background blurring.
arXiv Detail & Related papers (2023-10-12T01:25:21Z) - TcGAN: Semantic-Aware and Structure-Preserved GANs with Individual
Vision Transformer for Fast Arbitrary One-Shot Image Generation [11.207512995742999]
One-shot image generation (OSG) with generative adversarial networks that learn from the internal patches of a given image has attracted world wide attention.
We propose a novel structure-preserved method TcGAN with individual vision transformer to overcome the shortcomings of the existing one-shot image generation methods.
arXiv Detail & Related papers (2023-02-16T03:05:59Z) - FewGAN: Generating from the Joint Distribution of a Few Images [95.6635227371479]
We introduce FewGAN, a generative model for generating novel, high-quality and diverse images.
FewGAN is a hierarchical patch-GAN that applies quantization at the first coarse scale, followed by a pyramid of residual fully convolutional GANs at finer scales.
In an extensive set of experiments, it is shown that FewGAN outperforms baselines both quantitatively and qualitatively.
arXiv Detail & Related papers (2022-07-18T07:11:28Z) - DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation [56.514462874501675]
We propose a dynamic sparse attention based Transformer model to achieve fine-level matching with favorable efficiency.
The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on.
Experiments on three applications, pose-guided person image generation, edge-based face synthesis, and undistorted image style transfer, demonstrate that DynaST achieves superior performance in local details.
arXiv Detail & Related papers (2022-07-13T11:12:03Z) - Meta Internal Learning [88.68276505511922]
Internal learning for single-image generation is a framework, where a generator is trained to produce novel images based on a single image.
We propose a meta-learning approach that enables training over a collection of images, in order to model the internal statistics of the sample image more effectively.
Our results show that the models obtained are as suitable as single-image GANs for many common image applications.
arXiv Detail & Related papers (2021-10-06T16:27:38Z) - Enhance Images as You Like with Unpaired Learning [8.104571453311442]
We propose a lightweight one-path conditional generative adversarial network (cGAN) to learn a one-to-many relation from low-light to normal-light image space.
Our network learns to generate a collection of enhanced images from a given input conditioned on various reference images.
Our model achieves competitive visual and quantitative results on par with fully supervised methods on both noisy and clean datasets.
arXiv Detail & Related papers (2021-10-04T03:00:44Z) - StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis [68.3787368024951]
We propose a novel approach for multi-modal Image-to-image (I2I) translation.
We learn a latent embedding, jointly with the generator, that models the variability of the output domain.
Specifically, we pre-train a generic style encoder using a novel proxy task to learn an embedding of images, from arbitrary domains, into a low-dimensional style latent space.
arXiv Detail & Related papers (2021-04-14T19:58:24Z) - Encoding Robustness to Image Style via Adversarial Feature Perturbations [72.81911076841408]
We adapt adversarial training by directly perturbing feature statistics, rather than image pixels, to produce robust models.
Our proposed method, Adversarial Batch Normalization (AdvBN), is a single network layer that generates worst-case feature perturbations during training.
arXiv Detail & Related papers (2020-09-18T17:52:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.