Related papers: Improving Statistical Fidelity for Neural Image Compression with Implicit Local Likelihood Models

Improving Statistical Fidelity for Neural Image Compression with Implicit Local Likelihood Models

URL: http://arxiv.org/abs/2301.11189v3
Date: Fri, 11 Aug 2023 02:21:27 GMT
Title: Improving Statistical Fidelity for Neural Image Compression with Implicit Local Likelihood Models
Authors: Matthew J. Muckley, Alaaeldin El-Nouby, Karen Ullrich, Herv\'e J\'egou, Jakob Verbeek
Abstract summary: Lossy image compression aims to represent images in as few bits as possible while maintaining fidelity to the original. We introduce a non-binary discriminator that is conditioned on quantized local image representations obtained via VQ-VAE autoencoders.
Score: 31.308949268401047
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Lossy image compression aims to represent images in as few bits as possible while maintaining fidelity to the original. Theoretical results indicate that optimizing distortion metrics such as PSNR or MS-SSIM necessarily leads to a discrepancy in the statistics of original images from those of reconstructions, in particular at low bitrates, often manifested by the blurring of the compressed images. Previous work has leveraged adversarial discriminators to improve statistical fidelity. Yet these binary discriminators adopted from generative modeling tasks may not be ideal for image compression. In this paper, we introduce a non-binary discriminator that is conditioned on quantized local image representations obtained via VQ-VAE autoencoders. Our evaluations on the CLIC2020, DIV2K and Kodak datasets show that our discriminator is more effective for jointly optimizing distortion (e.g., PSNR) and statistical fidelity (e.g., FID) than the PatchGAN of the state-of-the-art HiFiC model. On CLIC2020, we obtain the same FID as HiFiC with 30-40\% fewer bits.

Related papers

Good, Cheap, and Fast: Overfitted Image Compression with Wasserstein Distortion [13.196774986841469]
We show that by focusing on modeling visual perception rather than the data distribution, we can achieve a good trade-off between visual quality and bit rate. We do this by optimizing C3, an overfitted image, for Wasserstein Distortion (WD) and evaluating the image reconstructions with a human rater study.
arXiv Detail & Related papers (2024-11-30T15:05:01Z)
A Rate-Distortion-Classification Approach for Lossy Image Compression [0.0]
In lossy image compression, the objective is to achieve minimal signal distortion while compressing images to a specified bit rate. To bridge the gap between image compression and visual analysis, we propose a Rate-Distortion-Classification (RDC) model for lossy image compression.
arXiv Detail & Related papers (2024-05-06T14:11:36Z)
Semantic Ensemble Loss and Latent Refinement for High-Fidelity Neural Image Compression [58.618625678054826]
This study presents an enhanced neural compression method designed for optimal visual fidelity. We have trained our model with a sophisticated semantic ensemble loss, integrating Charbonnier loss, perceptual loss, style loss, and a non-binary adversarial loss. Our empirical findings demonstrate that this approach significantly improves the statistical fidelity of neural image compression.
arXiv Detail & Related papers (2024-01-25T08:11:27Z)
Transferable Learned Image Compression-Resistant Adversarial Perturbations [66.46470251521947]
Adversarial attacks can readily disrupt the image classification system, revealing the vulnerability of DNN-based recognition tasks. We introduce a new pipeline that targets image classification models that utilize learned image compressors as pre-processing modules.
arXiv Detail & Related papers (2024-01-06T03:03:28Z)
Extreme Image Compression using Fine-tuned VQGANs [43.43014096929809]
We introduce vector quantization (VQ)-based generative models into the image compression domain. The codebook learned by the VQGAN model yields a strong expressive capacity. The proposed framework outperforms state-of-the-art codecs in terms of perceptual quality-oriented metrics.
arXiv Detail & Related papers (2023-07-17T06:14:19Z)
Beyond Learned Metadata-based Raw Image Reconstruction [86.1667769209103]
Raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels. They are not widely adopted by general users due to their substantial storage requirements. We propose a novel framework that learns a compact representation in the latent space, serving as metadata.
arXiv Detail & Related papers (2023-06-21T06:59:07Z)
Machine Perception-Driven Image Compression: A Layered Generative Approach [32.23554195427311]
layered generative image compression model is proposed to achieve high human vision-oriented image reconstructed quality. Task-agnostic learning-based compression model is proposed, which effectively supports various compressed domain-based analytical tasks. Joint optimization schedule is adopted to acquire best balance point among compression ratio, reconstructed image quality, and downstream perception performance.
arXiv Detail & Related papers (2023-04-14T02:12:38Z)
Implicit Neural Representations for Image Compression [103.78615661013623]
Implicit Neural Representations (INRs) have gained attention as a novel and effective representation for various data types. We propose the first comprehensive compression pipeline based on INRs including quantization, quantization-aware retraining and entropy coding. We find that our approach to source compression with INRs vastly outperforms similar prior work.
arXiv Detail & Related papers (2021-12-08T13:02:53Z)
Perceptually Optimizing Deep Image Compression [53.705543593594285]
Mean squared error (MSE) and $ell_p$ norms have largely dominated the measurement of loss in neural networks. We propose a different proxy approach to optimize image analysis networks against quantitative perceptual models.
arXiv Detail & Related papers (2020-07-03T14:33:28Z)
Discernible Image Compression [124.08063151879173]
This paper aims to produce compressed images by pursuing both appearance and perceptual consistency. Based on the encoder-decoder framework, we propose using a pre-trained CNN to extract features of the original and compressed images. Experiments on benchmarks demonstrate that images compressed by using the proposed method can also be well recognized by subsequent visual recognition and detection models.
arXiv Detail & Related papers (2020-02-17T07:35:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.