Learning Images Across Scales Using Adversarial Training
- URL: http://arxiv.org/abs/2406.08924v1
- Date: Thu, 13 Jun 2024 08:44:12 GMT
- Title: Learning Images Across Scales Using Adversarial Training
- Authors: Krzysztof Wolski, Adarsh Djeacoumar, Alireza Javanmardi, Hans-Peter Seidel, Christian Theobalt, Guillaume Cordonnier, Karol Myszkowski, George Drettakis, Xingang Pan, Thomas Leimkühler,
- Abstract summary: We devise a novel paradigm for learning a representation that captures an orders-of-magnitude variety of scales from an unstructured collection of ordinary images.
We show that our generator can be used as a multiscale generative model, and for reconstructions of scale spaces from unstructured patches.
- Score: 64.59447233902735
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The real world exhibits rich structure and detail across many scales of observation. It is difficult, however, to capture and represent a broad spectrum of scales using ordinary images. We devise a novel paradigm for learning a representation that captures an orders-of-magnitude variety of scales from an unstructured collection of ordinary images. We treat this collection as a distribution of scale-space slices to be learned using adversarial training, and additionally enforce coherency across slices. Our approach relies on a multiscale generator with carefully injected procedural frequency content, which allows to interactively explore the emerging continuous scale space. Training across vastly different scales poses challenges regarding stability, which we tackle using a supervision scheme that involves careful sampling of scales. We show that our generator can be used as a multiscale generative model, and for reconstructions of scale spaces from unstructured patches. Significantly outperforming the state of the art, we demonstrate zoom-in factors of up to 256x at high quality and scale consistency.
Related papers
- FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis [48.9652334528436]
We introduce an innovative, training-free approach FouriScale from the perspective of frequency domain analysis.
We replace the original convolutional layers in pre-trained diffusion models by incorporating a dilation technique along with a low-pass operation.
Our method successfully balances the structural integrity and fidelity of generated images, achieving an astonishing capacity of arbitrary-size, high-resolution, and high-quality generation.
arXiv Detail & Related papers (2024-03-19T17:59:33Z) - Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL)
Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images.
Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z) - Generative Powers of Ten [60.6740997942711]
We present a method that uses a text-to-image model to generate consistent content across multiple image scales.
We achieve this through a joint multi-scale diffusion sampling approach.
Our method enables deeper levels of zoom than traditional super-resolution methods.
arXiv Detail & Related papers (2023-12-04T18:59:25Z) - Dual Pyramid Generative Adversarial Networks for Semantic Image
Synthesis [94.76988562653845]
The goal of semantic image synthesis is to generate photo-realistic images from semantic label maps.
Current state-of-the-art approaches, however, still struggle to generate realistic objects in images at various scales.
We propose a Dual Pyramid Generative Adversarial Network (DP-GAN) that learns the conditioning of spatially-adaptive normalization blocks at all scales jointly.
arXiv Detail & Related papers (2022-10-08T18:45:44Z) - Scale Attention for Learning Deep Face Representation: A Study Against
Visual Scale Variation [69.45176408639483]
We reform the conv layer by resorting to the scale-space theory.
We build a novel style named SCale AttentioN Conv Neural Network (textbfSCAN-CNN)
As a single-shot scheme, the inference is more efficient than multi-shot fusion.
arXiv Detail & Related papers (2022-09-19T06:35:04Z) - Arbitrary-Scale Image Synthesis [149.0290830305808]
Positional encodings have enabled recent works to train a single adversarial network that can generate images of different scales.
We propose the design of scale-consistent positional encodings invariant to our generator's transformations layers.
We show competitive results for a continuum of scales on various commonly used datasets for image synthesis.
arXiv Detail & Related papers (2022-04-05T15:10:43Z) - Nested Scale Editing for Conditional Image Synthesis [19.245119912119947]
We propose an image synthesis approach that provides stratified navigation in the latent code space.
With a tiny amount of partial or very low-resolution image, our approach can consistently out-perform state-of-the-art counterparts.
arXiv Detail & Related papers (2020-06-03T04:29:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.