Related papers: One-shot Ultra-high-Resolution Generative Adversarial Network That Synthesizes 16K Images On A Single GPU

One-shot Ultra-high-Resolution Generative Adversarial Network That Synthesizes 16K Images On A Single GPU

URL: http://arxiv.org/abs/2202.13799v3
Date: Mon, 28 Aug 2023 04:52:53 GMT
Title: One-shot Ultra-high-Resolution Generative Adversarial Network That Synthesizes 16K Images On A Single GPU
Authors: Junseok Oh, Donghwee Yoon and Injung Kim
Abstract summary: OUR-GAN is a one-shot generative adversarial network framework that generates non-repetitive 16K images from a single training image. OUR-GAN can synthesize high-quality 16K images with 12.5 GB of GPU memory and 4K images with only 4.29 GB. OUR-GAN is the first one-shot image synthesizer that generates non-repetitive UHR images on a single consumer GPU.
Score: 1.9060575156739825
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a one-shot ultra-high-resolution generative adversarial network (OUR-GAN) framework that generates non-repetitive 16K (16, 384 x 8, 640) images from a single training image and is trainable on a single consumer GPU. OUR-GAN generates an initial image that is visually plausible and varied in shape at low resolution, and then gradually increases the resolution by adding detail through super-resolution. Since OUR-GAN learns from a real ultra-high-resolution (UHR) image, it can synthesize large shapes with fine details and long-range coherence, which is difficult to achieve with conventional generative models that rely on the patch distribution learned from relatively small images. OUR-GAN can synthesize high-quality 16K images with 12.5 GB of GPU memory and 4K images with only 4.29 GB as it synthesizes a UHR image part by part through seamless subregion-wise super-resolution. Additionally, OUR-GAN improves visual coherence while maintaining diversity by applying vertical positional convolution. In experiments on the ST4K and RAISE datasets, OUR-GAN exhibited improved fidelity, visual coherency, and diversity compared with the baseline one-shot synthesis models. To the best of our knowledge, OUR-GAN is the first one-shot image synthesizer that generates non-repetitive UHR images on a single consumer GPU. The synthesized image samples are presented at https://our-gan.github.io.

Related papers

REDUCIO! Generating 1024$\times$1024 Video within 16 Seconds using Extremely Compressed Motion Latents [110.41795676048835]
One crucial obstacle for large-scale applications is the expensive training and inference cost. In this paper, we argue that videos contain much more redundant information than images, thus can be encoded by very few motion latents. We train Reducio-DiT in around 3.2K training hours in total and generate a 16-frame 1024*1024 video clip within 15.5 seconds on a single A100 GPU.
arXiv Detail & Related papers (2024-11-20T18:59:52Z)
HoloHisto: End-to-end Gigapixel WSI Segmentation with 4K Resolution Sequential Tokenization [21.1691961979094]
In digital pathology, the traditional method for deep learning-based image segmentation typically involves a two-stage process. We propose the holistic histopathology (HoloHisto) segmentation method to achieve end-to-end segmentation on gigapixel WSIs. Under the HoloHisto platform, we unveil a random 4K sampler that transcends ultra-high resolution, delivering 31 and 10 times more pixels than standard 2D and 3D patches.
arXiv Detail & Related papers (2024-07-03T17:49:31Z)
Is One GPU Enough? Pushing Image Generation at Higher-Resolutions with Foundation Models [4.257210316104905]
We introduce Pixelsmith, a zero-shot text-to-image generative framework to sample images at higher resolutions with a single GPU. We are the first to show that it is possible to scale the output of a pre-trained diffusion model by a factor of 1000, opening the road for gigapixel image generation at no additional cost.
arXiv Detail & Related papers (2024-06-11T13:33:33Z)
4K4D: Real-Time 4D View Synthesis at 4K Resolution [86.6582179227016]
This paper targets high-fidelity and real-time view of dynamic 3D scenes at 4K resolution. We propose a 4D point cloud representation that supports hardwareization and enables unprecedented rendering speed. Our representation can be rendered at over 400 FPS on the DNA-Rendering dataset at 1080p resolution and 80 FPS on the ENeRF-Outdoor dataset at 4K resolution using an 4090 GPU.
arXiv Detail & Related papers (2023-10-17T17:57:38Z)
ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models [126.35334860896373]
We investigate the capability of generating images from pre-trained diffusion models at much higher resolutions than the training image sizes. Existing works for higher-resolution generation, such as attention-based and joint-diffusion approaches, cannot well address these issues. We propose a simple yet effective re-dilation that can dynamically adjust the convolutional perception field during inference.
arXiv Detail & Related papers (2023-10-11T17:52:39Z)
Towards Efficient and Scale-Robust Ultra-High-Definition Image Demoireing [71.62289021118983]
We present an efficient baseline model ESDNet for tackling 4K moire images, wherein we build a semantic-aligned scale-aware module to address the scale variation of moire patterns. Our approach outperforms state-of-the-art methods by a large margin while being much more lightweight.
arXiv Detail & Related papers (2022-07-20T14:20:52Z)
Projected GANs Converge Faster [50.23237734403834]
Generative Adversarial Networks (GANs) produce high-quality images but are challenging to train. We make significant headway on these issues by projecting generated and real samples into a fixed, pretrained feature space. Our Projected GAN improves image quality, sample efficiency, and convergence speed.
arXiv Detail & Related papers (2021-11-01T15:11:01Z)
Spatial-Separated Curve Rendering Network for Efficient and High-Resolution Image Harmonization [59.19214040221055]
We propose a novel spatial-separated curve rendering network (S$2$CRNet) for efficient and high-resolution image harmonization. The proposed method reduces more than 90% parameters compared with previous methods. Our method can work smoothly on higher resolution images in real-time which is more than 10$times$ faster than the existing methods.
arXiv Detail & Related papers (2021-09-13T07:20:16Z)
ORStereo: Occlusion-Aware Recurrent Stereo Matching for 4K-Resolution Images [13.508624751092654]
We present the Occlusion-aware Recurrent binocular Stereo matching (ORStereo) ORStereo generalizes to unseen high-resolution images with large disparity ranges by formulating the task as residual updates and refinements of an initial prediction. We test the model's capability on both synthetic and real-world high-resolution images.
arXiv Detail & Related papers (2021-03-13T21:46:06Z)
Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis [21.40315235087551]
We propose a light-weight GAN structure that gains superior quality on 1024*1024 resolution. We show our model's superior performance compared to the state-of-the-art StyleGAN2, when data and computing budget are limited.
arXiv Detail & Related papers (2021-01-12T22:02:54Z)
GAN Compression: Efficient Architectures for Interactive Conditional GANs [45.012173624111185]
Recent Conditional Generative Adversarial Networks (cGANs) are 1-2 orders of magnitude more compute-intensive than modern recognition CNNs. We propose a general-purpose compression framework for reducing the inference time and model size of the generator in cGANs.
arXiv Detail & Related papers (2020-03-19T17:59:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.