Image Compression with Product Quantized Masked Image Modeling
- URL: http://arxiv.org/abs/2212.07372v2
- Date: Mon, 6 Nov 2023 13:16:00 GMT
- Title: Image Compression with Product Quantized Masked Image Modeling
- Authors: Alaaeldin El-Nouby, Matthew J. Muckley, Karen Ullrich, Ivan Laptev,
Jakob Verbeek, Herv\'e J\'egou
- Abstract summary: Recent neural compression methods have been based on the popular hyperprior framework.
It relies on Scalar Quantization and offers a very strong compression performance.
This contrasts from recent advances in image generation and representation learning, where Vector Quantization is more commonly employed.
- Score: 44.15706119017024
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent neural compression methods have been based on the popular hyperprior
framework. It relies on Scalar Quantization and offers a very strong
compression performance. This contrasts from recent advances in image
generation and representation learning, where Vector Quantization is more
commonly employed. In this work, we attempt to bring these lines of research
closer by revisiting vector quantization for image compression. We build upon
the VQ-VAE framework and introduce several modifications. First, we replace the
vanilla vector quantizer by a product quantizer. This intermediate solution
between vector and scalar quantization allows for a much wider set of
rate-distortion points: It implicitly defines high-quality quantizers that
would otherwise require intractably large codebooks. Second, inspired by the
success of Masked Image Modeling (MIM) in the context of self-supervised
learning and generative image models, we propose a novel conditional entropy
model which improves entropy coding by modelling the co-dependencies of the
quantized latent codes. The resulting PQ-MIM model is surprisingly effective:
its compression performance on par with recent hyperprior methods. It also
outperforms HiFiC in terms of FID and KID metrics when optimized with
perceptual losses (e.g. adversarial). Finally, since PQ-MIM is compatible with
image generation frameworks, we show qualitatively that it can operate under a
hybrid mode between compression and generation, with no further training or
finetuning. As a result, we explore the extreme compression regime where an
image is compressed into 200 bytes, i.e., less than a tweet.
Related papers
- Unifying Generation and Compression: Ultra-low bitrate Image Coding Via
Multi-stage Transformer [35.500720262253054]
This paper introduces a novel Unified Image Generation-Compression (UIGC) paradigm, merging the processes of generation and compression.
A key feature of the UIGC framework is the adoption of vector-quantized (VQ) image models for tokenization.
Experiments demonstrate the superiority of the proposed UIGC framework over existing codecs in perceptual quality and human perception.
arXiv Detail & Related papers (2024-03-06T14:27:02Z) - Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence.
We find that gradients require milder compression rates than activations.
Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z) - Multiscale Augmented Normalizing Flows for Image Compression [17.441496966834933]
We present a novel concept, which adapts the hierarchical latent space for augmented normalizing flows, an invertible latent variable model.
Our best performing model achieved average rate savings of more than 7% over comparable single-scale models.
arXiv Detail & Related papers (2023-05-09T13:42:43Z) - High-Fidelity Variable-Rate Image Compression via Invertible Activation
Transformation [24.379052026260034]
We propose the Invertible Activation Transformation (IAT) module to tackle the issue of high-fidelity fine variable-rate image compression.
IAT and QLevel together give the image compression model the ability of fine variable-rate control while better maintaining the image fidelity.
Our method outperforms the state-of-the-art variable-rate image compression method by a large margin, especially after multiple re-encodings.
arXiv Detail & Related papers (2022-09-12T07:14:07Z) - Lossy Image Compression with Quantized Hierarchical VAEs [33.173021636656465]
ResNet VAEs are originally designed for data (image) distribution modeling.
We present a powerful and efficient model that outperforms previous methods on natural image lossy compression.
Our model compresses images in a coarse-to-fine fashion and supports parallel encoding and decoding.
arXiv Detail & Related papers (2022-08-27T17:15:38Z) - Estimating the Resize Parameter in End-to-end Learned Image Compression [50.20567320015102]
We describe a search-free resizing framework that can further improve the rate-distortion tradeoff of recent learned image compression models.
Our results show that our new resizing parameter estimation framework can provide Bjontegaard-Delta rate (BD-rate) improvement of about 10% against leading perceptual quality engines.
arXiv Detail & Related papers (2022-04-26T01:35:02Z) - Unified Multivariate Gaussian Mixture for Efficient Neural Image
Compression [151.3826781154146]
latent variables with priors and hyperpriors is an essential problem in variational image compression.
We find inter-correlations and intra-correlations exist when observing latent variables in a vectorized perspective.
Our model has better rate-distortion performance and an impressive $3.18times$ compression speed up.
arXiv Detail & Related papers (2022-03-21T11:44:17Z) - Entroformer: A Transformer-based Entropy Model for Learned Image
Compression [17.51693464943102]
We propose a novel transformer-based entropy model, termed Entroformer, to capture long-range dependencies in probability distribution estimation.
The experiments show that the Entroformer achieves state-of-the-art performance on image compression while being time-efficient.
arXiv Detail & Related papers (2022-02-11T08:03:31Z) - Implicit Neural Representations for Image Compression [103.78615661013623]
Implicit Neural Representations (INRs) have gained attention as a novel and effective representation for various data types.
We propose the first comprehensive compression pipeline based on INRs including quantization, quantization-aware retraining and entropy coding.
We find that our approach to source compression with INRs vastly outperforms similar prior work.
arXiv Detail & Related papers (2021-12-08T13:02:53Z) - Quantization Guided JPEG Artifact Correction [69.04777875711646]
We develop a novel architecture for artifact correction using the JPEG files quantization matrix.
This allows our single model to achieve state-of-the-art performance over models trained for specific quality settings.
arXiv Detail & Related papers (2020-04-17T00:10:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.