Related papers: Image Compression with Product Quantized Masked Image Modeling

Image Compression with Product Quantized Masked Image Modeling

URL: http://arxiv.org/abs/2212.07372v2
Date: Mon, 6 Nov 2023 13:16:00 GMT
Title: Image Compression with Product Quantized Masked Image Modeling
Authors: Alaaeldin El-Nouby, Matthew J. Muckley, Karen Ullrich, Ivan Laptev, Jakob Verbeek, Herv\'e J\'egou
Abstract summary: Recent neural compression methods have been based on the popular hyperprior framework. It relies on Scalar Quantization and offers a very strong compression performance. This contrasts from recent advances in image generation and representation learning, where Vector Quantization is more commonly employed.
Score: 44.15706119017024
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent neural compression methods have been based on the popular hyperprior framework. It relies on Scalar Quantization and offers a very strong compression performance. This contrasts from recent advances in image generation and representation learning, where Vector Quantization is more commonly employed. In this work, we attempt to bring these lines of research closer by revisiting vector quantization for image compression. We build upon the VQ-VAE framework and introduce several modifications. First, we replace the vanilla vector quantizer by a product quantizer. This intermediate solution between vector and scalar quantization allows for a much wider set of rate-distortion points: It implicitly defines high-quality quantizers that would otherwise require intractably large codebooks. Second, inspired by the success of Masked Image Modeling (MIM) in the context of self-supervised learning and generative image models, we propose a novel conditional entropy model which improves entropy coding by modelling the co-dependencies of the quantized latent codes. The resulting PQ-MIM model is surprisingly effective: its compression performance on par with recent hyperprior methods. It also outperforms HiFiC in terms of FID and KID metrics when optimized with perceptual losses (e.g. adversarial). Finally, since PQ-MIM is compatible with image generation frameworks, we show qualitatively that it can operate under a hybrid mode between compression and generation, with no further training or finetuning. As a result, we explore the extreme compression regime where an image is compressed into 200 bytes, i.e., less than a tweet.

Related papers

Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression [90.59962443790593]
In this paper, we present a variable-rate image compression model based on invertible transform to overcome limitations. Specifically, we design a lightweight multi-scale invertible neural network, which maps the input image into multi-scale latent representations. Experimental results demonstrate that the proposed method achieves state-of-the-art performance compared to existing variable-rate methods.
arXiv Detail & Related papers (2025-03-27T09:08:39Z)
A Fast Quantum Image Compression Algorithm based on Taylor Expansion [0.0]
In this study, we upgrade a quantum image compression algorithm within parameterized quantum circuits. Our approach encodes image data as unitary operator parameters and applies the quantum compilation algorithm to emulate the encryption process. Experimental results on benchmark images, including Lenna and Cameraman, show that our method achieves up to 86% reduction in the number of iterations.
arXiv Detail & Related papers (2025-02-15T06:03:49Z)
Unifying Generation and Compression: Ultra-low bitrate Image Coding Via Multi-stage Transformer [35.500720262253054]
This paper introduces a novel Unified Image Generation-Compression (UIGC) paradigm, merging the processes of generation and compression. A key feature of the UIGC framework is the adoption of vector-quantized (VQ) image models for tokenization. Experiments demonstrate the superiority of the proposed UIGC framework over existing codecs in perceptual quality and human perception.
arXiv Detail & Related papers (2024-03-06T14:27:02Z)
Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence. We find that gradients require milder compression rates than activations. Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z)
Multiscale Augmented Normalizing Flows for Image Compression [17.441496966834933]
We present a novel concept, which adapts the hierarchical latent space for augmented normalizing flows, an invertible latent variable model. Our best performing model achieved average rate savings of more than 7% over comparable single-scale models.
arXiv Detail & Related papers (2023-05-09T13:42:43Z)
High-Fidelity Variable-Rate Image Compression via Invertible Activation Transformation [24.379052026260034]
We propose the Invertible Activation Transformation (IAT) module to tackle the issue of high-fidelity fine variable-rate image compression. IAT and QLevel together give the image compression model the ability of fine variable-rate control while better maintaining the image fidelity. Our method outperforms the state-of-the-art variable-rate image compression method by a large margin, especially after multiple re-encodings.
arXiv Detail & Related papers (2022-09-12T07:14:07Z)
Lossy Image Compression with Quantized Hierarchical VAEs [33.173021636656465]
ResNet VAEs are originally designed for data (image) distribution modeling. We present a powerful and efficient model that outperforms previous methods on natural image lossy compression. Our model compresses images in a coarse-to-fine fashion and supports parallel encoding and decoding.
arXiv Detail & Related papers (2022-08-27T17:15:38Z)
Estimating the Resize Parameter in End-to-end Learned Image Compression [50.20567320015102]
We describe a search-free resizing framework that can further improve the rate-distortion tradeoff of recent learned image compression models. Our results show that our new resizing parameter estimation framework can provide Bjontegaard-Delta rate (BD-rate) improvement of about 10% against leading perceptual quality engines.
arXiv Detail & Related papers (2022-04-26T01:35:02Z)
Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression [151.3826781154146]
latent variables with priors and hyperpriors is an essential problem in variational image compression. We find inter-correlations and intra-correlations exist when observing latent variables in a vectorized perspective. Our model has better rate-distortion performance and an impressive $3.18times$ compression speed up.
arXiv Detail & Related papers (2022-03-21T11:44:17Z)
Entroformer: A Transformer-based Entropy Model for Learned Image Compression [17.51693464943102]
We propose a novel transformer-based entropy model, termed Entroformer, to capture long-range dependencies in probability distribution estimation. The experiments show that the Entroformer achieves state-of-the-art performance on image compression while being time-efficient.
arXiv Detail & Related papers (2022-02-11T08:03:31Z)
Implicit Neural Representations for Image Compression [103.78615661013623]
Implicit Neural Representations (INRs) have gained attention as a novel and effective representation for various data types. We propose the first comprehensive compression pipeline based on INRs including quantization, quantization-aware retraining and entropy coding. We find that our approach to source compression with INRs vastly outperforms similar prior work.
arXiv Detail & Related papers (2021-12-08T13:02:53Z)
Quantization Guided JPEG Artifact Correction [69.04777875711646]
We develop a novel architecture for artifact correction using the JPEG files quantization matrix. This allows our single model to achieve state-of-the-art performance over models trained for specific quality settings.
arXiv Detail & Related papers (2020-04-17T00:10:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.