Generating Images with Sparse Representations
- URL: http://arxiv.org/abs/2103.03841v1
- Date: Fri, 5 Mar 2021 17:56:03 GMT
- Title: Generating Images with Sparse Representations
- Authors: Charlie Nash, Jacob Menick, Sander Dieleman, Peter W. Battaglia
- Abstract summary: High dimensionality of images presents architecture and sampling-efficiency challenges for likelihood-based generative models.
We present an alternative approach, inspired by common image compression methods like JPEG, and convert images to quantized discrete cosine transform (DCT) blocks.
We propose a Transformer-based autoregressive architecture, which is trained to sequentially predict the conditional distribution of the next element in such sequences.
- Score: 21.27273495926409
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The high dimensionality of images presents architecture and
sampling-efficiency challenges for likelihood-based generative models. Previous
approaches such as VQ-VAE use deep autoencoders to obtain compact
representations, which are more practical as inputs for likelihood-based
models. We present an alternative approach, inspired by common image
compression methods like JPEG, and convert images to quantized discrete cosine
transform (DCT) blocks, which are represented sparsely as a sequence of DCT
channel, spatial location, and DCT coefficient triples. We propose a
Transformer-based autoregressive architecture, which is trained to sequentially
predict the conditional distribution of the next element in such sequences, and
which scales effectively to high resolution images. On a range of image
datasets, we demonstrate that our approach can generate high quality, diverse
images, with sample metric scores competitive with state of the art methods. We
additionally show that simple modifications to our method yield effective image
colorization and super-resolution models.
Related papers
- Efficient Visual State Space Model for Image Deblurring [83.57239834238035]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration.
We propose a simple yet effective visual state space model (EVSSM) for image deblurring.
arXiv Detail & Related papers (2024-05-23T09:13:36Z) - Image Inpainting via Tractable Steering of Diffusion Models [54.13818673257381]
This paper proposes to exploit the ability of Tractable Probabilistic Models (TPMs) to exactly and efficiently compute the constrained posterior.
Specifically, this paper adopts a class of expressive TPMs termed Probabilistic Circuits (PCs)
We show that our approach can consistently improve the overall quality and semantic coherence of inpainted images with only 10% additional computational overhead.
arXiv Detail & Related papers (2023-11-28T21:14:02Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - AICT: An Adaptive Image Compression Transformer [18.05997169440533]
We propose a more straightforward yet effective Tranformer-based channel-wise auto-regressive prior model, resulting in an absolute image compression transformer (ICT)
The proposed ICT can capture both global and local contexts from the latent representations.
We leverage a learnable scaling module with a sandwich ConvNeXt-based pre/post-processor to accurately extract more compact latent representation.
arXiv Detail & Related papers (2023-07-12T11:32:02Z) - Beyond Learned Metadata-based Raw Image Reconstruction [86.1667769209103]
Raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels.
They are not widely adopted by general users due to their substantial storage requirements.
We propose a novel framework that learns a compact representation in the latent space, serving as metadata.
arXiv Detail & Related papers (2023-06-21T06:59:07Z) - High-Perceptual Quality JPEG Decoding via Posterior Sampling [13.238373528922194]
We propose a different paradigm for JPEG artifact correction.
We aim to obtain sharp, detailed and visually reconstructed images, while being consistent with the compressed input.
Our solution offers a diverse set of plausible and fast reconstructions for a given input with perfect consistency.
arXiv Detail & Related papers (2022-11-21T19:47:59Z) - MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation [41.029441562130984]
Two-stage Vector Quantized (VQ) generative models allow for synthesizing high-fidelity and high-resolution images.
Our proposed modulated VQGAN is able to greatly improve the reconstructed image quality as well as provide high-fidelity image generation.
arXiv Detail & Related papers (2022-09-19T13:26:51Z) - Variable-Rate Deep Image Compression through Spatially-Adaptive Feature
Transform [58.60004238261117]
We propose a versatile deep image compression network based on Spatial Feature Transform (SFT arXiv:1804.02815)
Our model covers a wide range of compression rates using a single model, which is controlled by arbitrary pixel-wise quality maps.
The proposed framework allows us to perform task-aware image compressions for various tasks.
arXiv Detail & Related papers (2021-08-21T17:30:06Z) - FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning [64.32306537419498]
We propose a novel learned feature-based refinement and augmentation method that produces a varied set of complex transformations.
These transformations also use information from both within-class and across-class representations that we extract through clustering.
We demonstrate that our method is comparable to current state of art for smaller datasets while being able to scale up to larger datasets.
arXiv Detail & Related papers (2020-07-16T17:55:31Z) - A Multiparametric Class of Low-complexity Transforms for Image and Video
Coding [0.0]
We introduce a new class of low-complexity 8-point DCT approximations based on a series of works published by Bouguezel, Ahmed and Swamy.
We show that the optimal DCT approximations present compelling results in terms of coding efficiency and image quality metrics.
arXiv Detail & Related papers (2020-06-19T21:56:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.