Related papers: Generating Images with Sparse Representations

Generating Images with Sparse Representations

URL: http://arxiv.org/abs/2103.03841v1
Date: Fri, 5 Mar 2021 17:56:03 GMT
Title: Generating Images with Sparse Representations
Authors: Charlie Nash, Jacob Menick, Sander Dieleman, Peter W. Battaglia
Abstract summary: High dimensionality of images presents architecture and sampling-efficiency challenges for likelihood-based generative models. We present an alternative approach, inspired by common image compression methods like JPEG, and convert images to quantized discrete cosine transform (DCT) blocks. We propose a Transformer-based autoregressive architecture, which is trained to sequentially predict the conditional distribution of the next element in such sequences.
Score: 21.27273495926409
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The high dimensionality of images presents architecture and sampling-efficiency challenges for likelihood-based generative models. Previous approaches such as VQ-VAE use deep autoencoders to obtain compact representations, which are more practical as inputs for likelihood-based models. We present an alternative approach, inspired by common image compression methods like JPEG, and convert images to quantized discrete cosine transform (DCT) blocks, which are represented sparsely as a sequence of DCT channel, spatial location, and DCT coefficient triples. We propose a Transformer-based autoregressive architecture, which is trained to sequentially predict the conditional distribution of the next element in such sequences, and which scales effectively to high resolution images. On a range of image datasets, we demonstrate that our approach can generate high quality, diverse images, with sample metric scores competitive with state of the art methods. We additionally show that simple modifications to our method yield effective image colorization and super-resolution models.

Related papers

Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression [90.59962443790593]
In this paper, we present a variable-rate image compression model based on invertible transform to overcome limitations. Specifically, we design a lightweight multi-scale invertible neural network, which maps the input image into multi-scale latent representations. Experimental results demonstrate that the proposed method achieves state-of-the-art performance compared to existing variable-rate methods.
arXiv Detail & Related papers (2025-03-27T09:08:39Z)
Efficient Visual State Space Model for Image Deblurring [83.57239834238035]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration. We propose a simple yet effective visual state space model (EVSSM) for image deblurring.
arXiv Detail & Related papers (2024-05-23T09:13:36Z)
Image Inpainting via Tractable Steering of Diffusion Models [54.13818673257381]
This paper proposes to exploit the ability of Tractable Probabilistic Models (TPMs) to exactly and efficiently compute the constrained posterior. Specifically, this paper adopts a class of expressive TPMs termed Probabilistic Circuits (PCs) We show that our approach can consistently improve the overall quality and semantic coherence of inpainted images with only 10% additional computational overhead.
arXiv Detail & Related papers (2023-11-28T21:14:02Z)
Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components. CNNs are used to augment the local texture information of coarse priors. DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z)
AICT: An Adaptive Image Compression Transformer [18.05997169440533]
We propose a more straightforward yet effective Tranformer-based channel-wise auto-regressive prior model, resulting in an absolute image compression transformer (ICT) The proposed ICT can capture both global and local contexts from the latent representations. We leverage a learnable scaling module with a sandwich ConvNeXt-based pre/post-processor to accurately extract more compact latent representation.
arXiv Detail & Related papers (2023-07-12T11:32:02Z)
Beyond Learned Metadata-based Raw Image Reconstruction [86.1667769209103]
Raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels. They are not widely adopted by general users due to their substantial storage requirements. We propose a novel framework that learns a compact representation in the latent space, serving as metadata.
arXiv Detail & Related papers (2023-06-21T06:59:07Z)
High-Perceptual Quality JPEG Decoding via Posterior Sampling [13.238373528922194]
We propose a different paradigm for JPEG artifact correction. We aim to obtain sharp, detailed and visually reconstructed images, while being consistent with the compressed input. Our solution offers a diverse set of plausible and fast reconstructions for a given input with perfect consistency.
arXiv Detail & Related papers (2022-11-21T19:47:59Z)
MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation [41.029441562130984]
Two-stage Vector Quantized (VQ) generative models allow for synthesizing high-fidelity and high-resolution images. Our proposed modulated VQGAN is able to greatly improve the reconstructed image quality as well as provide high-fidelity image generation.
arXiv Detail & Related papers (2022-09-19T13:26:51Z)
Semantic Image Synthesis via Diffusion Models [174.24523061460704]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks. Recent work on semantic image synthesis mainly follows the de facto GAN-based approaches. We propose a novel framework based on DDPM for semantic image synthesis.
arXiv Detail & Related papers (2022-06-30T18:31:51Z)
Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform [58.60004238261117]
We propose a versatile deep image compression network based on Spatial Feature Transform (SFT arXiv:1804.02815) Our model covers a wide range of compression rates using a single model, which is controlled by arbitrary pixel-wise quality maps. The proposed framework allows us to perform task-aware image compressions for various tasks.
arXiv Detail & Related papers (2021-08-21T17:30:06Z)
FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning [64.32306537419498]
We propose a novel learned feature-based refinement and augmentation method that produces a varied set of complex transformations. These transformations also use information from both within-class and across-class representations that we extract through clustering. We demonstrate that our method is comparable to current state of art for smaller datasets while being able to scale up to larger datasets.
arXiv Detail & Related papers (2020-07-16T17:55:31Z)
A Multiparametric Class of Low-complexity Transforms for Image and Video Coding [0.0]
We introduce a new class of low-complexity 8-point DCT approximations based on a series of works published by Bouguezel, Ahmed and Swamy. We show that the optimal DCT approximations present compelling results in terms of coding efficiency and image quality metrics.
arXiv Detail & Related papers (2020-06-19T21:56:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.