not-so-BigGAN: Generating High-Fidelity Images on Small Compute with
Wavelet-based Super-Resolution
- URL: http://arxiv.org/abs/2009.04433v2
- Date: Sun, 25 Oct 2020 18:41:09 GMT
- Title: not-so-BigGAN: Generating High-Fidelity Images on Small Compute with
Wavelet-based Super-Resolution
- Authors: Seungwook Han, Akash Srivastava, Cole Hurwitz, Prasanna Sattigeri and
David D. Cox
- Abstract summary: Nsb-GAN is a simple yet cost-effective two-step training framework for deep generative models.
Wavelet-based down-sampling method preserves more structural information than pixel-based methods.
On ImageNet 512x512, our model achieves a Fr'echet Inception Distance (FID) of 10.59 -- beating the baseline BigGAN model.
- Score: 23.15896056344987
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art models for high-resolution image generation, such as BigGAN
and VQVAE-2, require an incredible amount of compute resources and/or time (512
TPU-v3 cores) to train, putting them out of reach for the larger research
community. On the other hand, GAN-based image super-resolution models, such as
ESRGAN, can not only upscale images to high dimensions, but also are efficient
to train. In this paper, we present not-so-big-GAN (nsb-GAN), a simple yet
cost-effective two-step training framework for deep generative models (DGMs) of
high-dimensional natural images. First, we generate images in low-frequency
bands by training a sampler in the wavelet domain. Then, we super-resolve these
images from the wavelet domain back to the pixel-space with our novel wavelet
super-resolution decoder network. Wavelet-based down-sampling method preserves
more structural information than pixel-based methods, leading to significantly
better generative quality of the low-resolution sampler (e.g., 64x64). Since
the sampler and decoder can be trained in parallel and operate on much lower
dimensional spaces than end-to-end models, the training cost is substantially
reduced. On ImageNet 512x512, our model achieves a Fr\'echet Inception Distance
(FID) of 10.59 -- beating the baseline BigGAN model -- at half the compute (256
TPU-v3 cores).
Related papers
- Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings [15.2983201224858]
Large-scale 3D generative models require substantial computational resources yet often fall short in capturing fine details and complex geometries at high resolutions.
We introduce a novel approach called Wavelet Latent Diffusion, or WaLa, that encodes 3D shapes into compact latent encodings.
Specifically, we compress a $2563$ signed distance field into a $123 times 4$ latent grid, achieving an impressive 2427x compression ratio with minimal loss of detail.
Our models, both conditional and unconditional, contain approximately one billion parameters and successfully generate high-quality 3D shapes at $2563$
arXiv Detail & Related papers (2024-11-12T18:49:06Z) - Simpler Diffusion (SiD2): 1.5 FID on ImageNet512 with pixel-space diffusion [34.70370851239368]
We show that pixel-space models can in fact be very competitive to latent approaches both in quality and efficiency.
We present a simple recipe for scaling end-to-end pixel-space diffusion models to high resolutions.
arXiv Detail & Related papers (2024-10-25T06:20:06Z) - Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models [41.67994377132345]
We propose a greedy algorithm that grows the architecture into high-resolution end-to-end models.
This enables a single stage model capable of generating high-resolution images without the need of a super-resolution cascade.
Our results rely on public datasets and show that we are able to train non-cascaded models up to 8B parameters with no further regularization schemes.
arXiv Detail & Related papers (2024-05-27T02:12:39Z) - LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content
Creation [51.19871052619077]
We introduce Large Multi-View Gaussian Model (LGM), a novel framework designed to generate high-resolution 3D models from text prompts or single-view images.
We maintain the fast speed to generate 3D objects within 5 seconds while boosting the training resolution to 512, thereby achieving high-resolution 3D content generation.
arXiv Detail & Related papers (2024-02-07T17:57:03Z) - Make-A-Shape: a Ten-Million-scale 3D Shape Model [52.701745578415796]
This paper introduces Make-A-Shape, a new 3D generative model designed for efficient training on a vast scale.
We first innovate a wavelet-tree representation to compactly encode shapes by formulating the subband coefficient filtering scheme.
We derive the subband adaptive training strategy to train our model to effectively learn to generate coarse and detail wavelet coefficients.
arXiv Detail & Related papers (2024-01-20T00:21:58Z) - ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with
Diffusion Models [126.35334860896373]
We investigate the capability of generating images from pre-trained diffusion models at much higher resolutions than the training image sizes.
Existing works for higher-resolution generation, such as attention-based and joint-diffusion approaches, cannot well address these issues.
We propose a simple yet effective re-dilation that can dynamically adjust the convolutional perception field during inference.
arXiv Detail & Related papers (2023-10-11T17:52:39Z) - High-Resolution Volumetric Reconstruction for Clothed Humans [27.900514732877827]
We present a novel method for reconstructing clothed humans from a sparse set of, e.g., 1 to 6 RGB images.
Our method significantly reduces the mean point-to-surface (P2S) precision of state-of-the-art methods by more than 50% to achieve approximately 2mm accuracy with a 512 volume resolution.
arXiv Detail & Related papers (2023-07-25T06:37:50Z) - A Three-Player GAN for Super-Resolution in Magnetic Resonance Imaging [8.254662744916171]
Current SISR methods for 3D volumetric images are based on Generative Adversarial Networks (GANs)
Here, we propose a new method for 3D SR based on the GAN framework. Specifically, we use instance noise to balance the GAN training. Furthermore, we use a relativistic GAN loss function and an updating feature extractor during the training process.
arXiv Detail & Related papers (2023-03-24T10:19:34Z) - Improved Transformer for High-Resolution GANs [69.42469272015481]
We introduce two key ingredients to Transformer to address this challenge.
We show in the experiments that the proposed HiT achieves state-of-the-art FID scores of 31.87 and 2.95 on unconditional ImageNet $128 times 128$ and FFHQ $256 times 256$, respectively.
arXiv Detail & Related papers (2021-06-14T17:39:49Z) - Cascaded Diffusion Models for High Fidelity Image Generation [53.57766722279425]
We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation challenge.
A cascaded diffusion model comprises a pipeline of multiple diffusion models that generate images of increasing resolution.
We find that the sample quality of a cascading pipeline relies crucially on conditioning augmentation.
arXiv Detail & Related papers (2021-05-30T17:14:52Z) - Hierarchical Amortized Training for Memory-efficient High Resolution 3D
GAN [52.851990439671475]
We propose a novel end-to-end GAN architecture that can generate high-resolution 3D images.
We achieve this goal by using different configurations between training and inference.
Experiments on 3D thorax CT and brain MRI demonstrate that our approach outperforms state of the art in image generation.
arXiv Detail & Related papers (2020-08-05T02:33:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.