Related papers: MLIC: Multi-Reference Entropy Model for Learned Image Compression

MLIC: Multi-Reference Entropy Model for Learned Image Compression

URL: http://arxiv.org/abs/2211.07273v9
Date: Tue, 16 Jan 2024 15:24:53 GMT
Title: MLIC: Multi-Reference Entropy Model for Learned Image Compression
Authors: Wei Jiang, Jiayu Yang, Yongqi Zhai, Peirong Ning, Feng Gao, Ronggang Wang
Abstract summary: We propose the Multi-Reference Entropy Model (MEM) and the advanced version, MEM$+$, to capture different types of correlations present in latent representation. Based on MEM and MEM$+$, we propose image compression models MLIC and MLIC$+$. Our MLIC and MLIC$+$ models achieve state-of-the-art performance, reducing BD-rate by $8.05%$ and $11.39%$ on the Kodak dataset compared to VTM-17.0 when measured in PSNR.
Score: 28.63380127598021
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, learned image compression has achieved remarkable performance. The entropy model, which estimates the distribution of the latent representation, plays a crucial role in boosting rate-distortion performance. However, most entropy models only capture correlations in one dimension, while the latent representation contain channel-wise, local spatial, and global spatial correlations. To tackle this issue, we propose the Multi-Reference Entropy Model (MEM) and the advanced version, MEM$^+$. These models capture the different types of correlations present in latent representation. Specifically, We first divide the latent representation into slices. When decoding the current slice, we use previously decoded slices as context and employ the attention map of the previously decoded slice to predict global correlations in the current slice. To capture local contexts, we introduce two enhanced checkerboard context capturing techniques that avoids performance degradation. Based on MEM and MEM$^+$, we propose image compression models MLIC and MLIC$^+$. Extensive experimental evaluations demonstrate that our MLIC and MLIC$^+$ models achieve state-of-the-art performance, reducing BD-rate by $8.05\%$ and $11.39\%$ on the Kodak dataset compared to VTM-17.0 when measured in PSNR. Our code is available at https://github.com/JiangWeibeta/MLIC.

Related papers

Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression [90.59962443790593]
In this paper, we present a variable-rate image compression model based on invertible transform to overcome limitations. Specifically, we design a lightweight multi-scale invertible neural network, which maps the input image into multi-scale latent representations. Experimental results demonstrate that the proposed method achieves state-of-the-art performance compared to existing variable-rate methods.
arXiv Detail & Related papers (2025-03-27T09:08:39Z)
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective [52.778766190479374]
Latent-based image generative models have achieved notable success in image generation tasks. Despite sharing the same latent space, autoregressive models significantly lag behind LDMs and MIMs in image generation. We propose a simple but effective discrete image tokenizer to stabilize the latent space for image generative modeling.
arXiv Detail & Related papers (2024-10-16T12:13:17Z)
SIGMA:Sinkhorn-Guided Masked Video Modeling [69.31715194419091]
Sinkhorn-guided Masked Video Modelling ( SIGMA) is a novel video pretraining method. We distribute features of space-time tubes evenly across a limited number of learnable clusters. Experimental results on ten datasets validate the effectiveness of SIGMA in learning more performant, temporally-aware, and robust video representations.
arXiv Detail & Related papers (2024-07-22T08:04:09Z)
MLIC++: Linear Complexity Multi-Reference Entropy Modeling for Learned Image Compression [30.71965784982577]
We introduce MEM++, which captures diverse range of correlations inherent in the latent representation. MEM++ achieves state-of-the-art performance, reducing BD-rate by 13.39% on the Kodak dataset compared to VTM-17.0 in PSNR. MLIC++ exhibits linear GPU memory consumption with resolution, making it highly suitable for high-resolution image coding.
arXiv Detail & Related papers (2023-07-28T09:11:37Z)
You Can Mask More For Extremely Low-Bitrate Image Compression [80.7692466922499]
Learned image compression (LIC) methods have experienced significant progress during recent years. LIC methods fail to explicitly explore the image structure and texture components crucial for image compression. We present DA-Mask that samples visible patches based on the structure and texture of original images. We propose a simple yet effective masked compression model (MCM), the first framework that unifies LIC and LIC end-to-end for extremely low-bitrate compression.
arXiv Detail & Related papers (2023-06-27T15:36:22Z)
Lossy Image Compression with Conditional Diffusion Models [25.158390422252097]
This paper outlines an end-to-end optimized lossy image compression framework using diffusion generative models. In contrast to VAE-based neural compression, where the (mean) decoder is a deterministic neural network, our decoder is a conditional diffusion model. Our approach yields stronger reported FID scores than the GAN-based model, while also yielding competitive performance with VAE-based models in several distortion metrics.
arXiv Detail & Related papers (2022-09-14T21:53:27Z)
Estimating the Resize Parameter in End-to-end Learned Image Compression [50.20567320015102]
We describe a search-free resizing framework that can further improve the rate-distortion tradeoff of recent learned image compression models. Our results show that our new resizing parameter estimation framework can provide Bjontegaard-Delta rate (BD-rate) improvement of about 10% against leading perceptual quality engines.
arXiv Detail & Related papers (2022-04-26T01:35:02Z)
Lossless Compression with Latent Variable Models [4.289574109162585]
We use latent variable models, which we call 'bits back with asymmetric numeral systems' (BB-ANS) The method involves interleaving encode and decode steps, and achieves an optimal rate when compressing batches of data. We describe 'Craystack', a modular software framework which we have developed for rapid prototyping of compression using deep generative models.
arXiv Detail & Related papers (2021-04-21T14:03:05Z)
Overfitting for Fun and Profit: Instance-Adaptive Data Compression [20.764189960709164]
Neural data compression has been shown to outperform classical methods in terms of $RD$ performance. In this paper we take this concept to the extreme, adapting the full model to a single video, and sending model updates along with the latent representation. We demonstrate that full-model adaptation improves $RD$ performance by 1 dB, with respect to encoder-only finetuning.
arXiv Detail & Related papers (2021-01-21T15:58:58Z)
Causal Contextual Prediction for Learned Image Compression [36.08393281509613]
We propose the concept of separate entropy coding to leverage a serial decoding process for causal contextual entropy prediction in the latent space. A causal context model is proposed that separates the latents across channels and makes use of cross-channel relationships to generate highly informative contexts. We also propose a causal global prediction model, which is able to find global reference points for accurate predictions of unknown points.
arXiv Detail & Related papers (2020-11-19T08:15:10Z)
Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts. We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively. Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively. Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z)
Prototype Mixture Models for Few-shot Semantic Segmentation [50.866870384596446]
Few-shot segmentation is challenging because objects within the support and query images could significantly differ in appearance and pose. We propose prototype mixture models (PMMs), which correlate diverse image regions with multiple prototypes to enforce the prototype-based semantic representation. PMMs improve 5-shot segmentation performance on MS-COCO by up to 5.82% with only a moderate cost for model size and inference speed.
arXiv Detail & Related papers (2020-08-10T04:33:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.