MLIC++: Linear Complexity Multi-Reference Entropy Modeling for Learned
Image Compression
- URL: http://arxiv.org/abs/2307.15421v9
- Date: Tue, 20 Feb 2024 03:25:43 GMT
- Title: MLIC++: Linear Complexity Multi-Reference Entropy Modeling for Learned
Image Compression
- Authors: Wei Jiang, Jiayu Yang, Yongqi Zhai, Feng Gao, Ronggang Wang
- Abstract summary: We introduce MEM++, which captures diverse range of correlations inherent in the latent representation.
MEM++ achieves state-of-the-art performance, reducing BD-rate by 13.39% on the Kodak dataset compared to VTM-17.0 in PSNR.
MLIC++ exhibits linear GPU memory consumption with resolution, making it highly suitable for high-resolution image coding.
- Score: 30.71965784982577
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, learned image compression has achieved impressive performance. The
entropy model, which estimates the distribution of the latent representation,
plays a crucial role in enhancing rate-distortion performance. However,
existing global context modules rely on computationally intensive quadratic
complexity computations to capture global correlations. This quadratic
complexity imposes limitations on the potential of high-resolution image
coding. Moreover, effectively capturing local, global, and channel-wise
contexts with acceptable even linear complexity within a single entropy model
remains a challenge. To address these limitations, we propose the Linear
Complexity Multi-Reference Entropy Model (MEM++). MEM++ effectively captures
the diverse range of correlations inherent in the latent representation.
Specifically, the latent representation is first divided into multiple slices.
When compressing a particular slice, the previously compressed slices serve as
its channel-wise contexts. To capture local contexts without sacrificing
performance, we introduce a novel checkerboard attention module. Additionally,
to capture global contexts, we propose the linear complexity attention-based
global correlations capturing by leveraging the decomposition of the softmax
operation. The attention map of the previously decoded slice is implicitly
computed and employed to predict global correlations in the current slice.
Based on MEM++, we propose image compression model MLIC++. Extensive
experimental evaluations demonstrate that our MLIC++ achieves state-of-the-art
performance, reducing BD-rate by 13.39% on the Kodak dataset compared to
VTM-17.0 in PSNR. Furthermore, MLIC++ exhibits linear GPU memory consumption
with resolution, making it highly suitable for high-resolution image coding.
Code and pre-trained models are available at
https://github.com/JiangWeibeta/MLIC.
Related papers
- LaMamba-Diff: Linear-Time High-Fidelity Diffusion Models Based on Local Attention and Mamba [54.85262314960038]
Local Attentional Mamba blocks capture both global contexts and local details with linear complexity.
Our model exhibits exceptional scalability and surpasses the performance of DiT across various model scales on ImageNet at 256x256 resolution.
Compared to state-of-the-art diffusion models on ImageNet 256x256 and 512x512, our largest model presents notable advantages, such as a reduction of up to 62% GFLOPs.
arXiv Detail & Related papers (2024-08-05T16:39:39Z) - SIGMA:Sinkhorn-Guided Masked Video Modeling [69.31715194419091]
Sinkhorn-guided Masked Video Modelling ( SIGMA) is a novel video pretraining method.
We distribute features of space-time tubes evenly across a limited number of learnable clusters.
Experimental results on ten datasets validate the effectiveness of SIGMA in learning more performant, temporally-aware, and robust video representations.
arXiv Detail & Related papers (2024-07-22T08:04:09Z) - Low-Resolution Self-Attention for Semantic Segmentation [96.81482872022237]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost.
Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution.
We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z) - MLIC: Multi-Reference Entropy Model for Learned Image Compression [28.63380127598021]
We propose the Multi-Reference Entropy Model (MEM) and the advanced version, MEM$+$, to capture different types of correlations present in latent representation.
Based on MEM and MEM$+$, we propose image compression models MLIC and MLIC$+$.
Our MLIC and MLIC$+$ models achieve state-of-the-art performance, reducing BD-rate by $8.05%$ and $11.39%$ on the Kodak dataset compared to VTM-17.0 when measured in PSNR.
arXiv Detail & Related papers (2022-11-14T11:07:18Z) - GOLLIC: Learning Global Context beyond Patches for Lossless
High-Resolution Image Compression [10.065286986365697]
We propose a hierarchical latent variable model with a global context to capture the long-term dependencies of high-resolution images.
We show that our global context model improves compression ratio compared to the engineered codecs and deep learning models.
arXiv Detail & Related papers (2022-10-07T03:15:02Z) - Effective Invertible Arbitrary Image Rescaling [77.46732646918936]
Invertible Neural Networks (INN) are able to increase upscaling accuracy significantly by optimizing the downscaling and upscaling cycle jointly.
A simple and effective invertible arbitrary rescaling network (IARN) is proposed to achieve arbitrary image rescaling by training only one model in this work.
It is shown to achieve a state-of-the-art (SOTA) performance in bidirectional arbitrary rescaling without compromising perceptual quality in LR outputs.
arXiv Detail & Related papers (2022-09-26T22:22:30Z) - Joint Global and Local Hierarchical Priors for Learned Image Compression [30.44884350320053]
Recently, learned image compression methods have shown superior performance compared to the traditional hand-crafted image codecs.
We propose a novel entropy model called Information Transformer (Informer) that exploits both local and global information in a content-dependent manner.
Our experiments demonstrate that Informer improves rate-distortion performance over the state-of-the-art methods on the Kodak and Tecnick datasets.
arXiv Detail & Related papers (2021-12-08T06:17:37Z) - Learning Context-Based Non-local Entropy Modeling for Image Compression [140.64888994506313]
In this paper, we propose a non-local operation for context modeling by employing the global similarity within the context.
The entropy model is further adopted as the rate loss in a joint rate-distortion optimization.
Considering that the width of the transforms is essential in training low distortion models, we finally produce a U-Net block in the transforms to increase the width with manageable memory consumption and time complexity.
arXiv Detail & Related papers (2020-05-10T13:28:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.