Related papers: MLIC++: Linear Complexity Multi-Reference Entropy Modeling for Learned Image Compression

MLIC++: Linear Complexity Multi-Reference Entropy Modeling for Learned Image Compression

URL: http://arxiv.org/abs/2307.15421v9
Date: Tue, 20 Feb 2024 03:25:43 GMT
Title: MLIC++: Linear Complexity Multi-Reference Entropy Modeling for Learned Image Compression
Authors: Wei Jiang, Jiayu Yang, Yongqi Zhai, Feng Gao, Ronggang Wang
Abstract summary: We introduce MEM++, which captures diverse range of correlations inherent in the latent representation. MEM++ achieves state-of-the-art performance, reducing BD-rate by 13.39% on the Kodak dataset compared to VTM-17.0 in PSNR. MLIC++ exhibits linear GPU memory consumption with resolution, making it highly suitable for high-resolution image coding.
Score: 30.71965784982577
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, learned image compression has achieved impressive performance. The entropy model, which estimates the distribution of the latent representation, plays a crucial role in enhancing rate-distortion performance. However, existing global context modules rely on computationally intensive quadratic complexity computations to capture global correlations. This quadratic complexity imposes limitations on the potential of high-resolution image coding. Moreover, effectively capturing local, global, and channel-wise contexts with acceptable even linear complexity within a single entropy model remains a challenge. To address these limitations, we propose the Linear Complexity Multi-Reference Entropy Model (MEM++). MEM++ effectively captures the diverse range of correlations inherent in the latent representation. Specifically, the latent representation is first divided into multiple slices. When compressing a particular slice, the previously compressed slices serve as its channel-wise contexts. To capture local contexts without sacrificing performance, we introduce a novel checkerboard attention module. Additionally, to capture global contexts, we propose the linear complexity attention-based global correlations capturing by leveraging the decomposition of the softmax operation. The attention map of the previously decoded slice is implicitly computed and employed to predict global correlations in the current slice. Based on MEM++, we propose image compression model MLIC++. Extensive experimental evaluations demonstrate that our MLIC++ achieves state-of-the-art performance, reducing BD-rate by 13.39% on the Kodak dataset compared to VTM-17.0 in PSNR. Furthermore, MLIC++ exhibits linear GPU memory consumption with resolution, making it highly suitable for high-resolution image coding. Code and pre-trained models are available at https://github.com/JiangWeibeta/MLIC.

Related papers

QuantVSR: Low-Bit Post-Training Quantization for Real-World Video Super-Resolution [53.13952833016505]
We propose a low-bit quantization model for real-world video super-resolution (VSR)<n>We use a calibration dataset to measure both spatial and temporal complexity for each layer.<n>We refine the FP and low-bit branches to achieve simultaneous optimization.
arXiv Detail & Related papers (2025-08-06T14:35:59Z)
CompGS++: Compressed Gaussian Splatting for Static and Dynamic Scene Representation [60.712165339762116]
CompGS++ is a novel framework that leverages compact Gaussian primitives to achieve accurate 3D modeling. Our design is based on the principle of eliminating redundancy both between and within primitives. Our implementation will be made publicly available on GitHub to facilitate further research.
arXiv Detail & Related papers (2025-04-17T15:33:01Z)
A Lightweight and Effective Image Tampering Localization Network with Vision Mamba [5.369780585789917]
Current image tampering localization methods rely on Convolutional Neural Networks (CNNs) and Transformers. We propose a lightweight and effective FORensic network based on vision MAmba (ForMa) for blind image tampering localization.
arXiv Detail & Related papers (2025-02-14T06:35:44Z)
LaMamba-Diff: Linear-Time High-Fidelity Diffusion Models Based on Local Attention and Mamba [54.85262314960038]
Local Attentional Mamba blocks capture both global contexts and local details with linear complexity. Our model exhibits exceptional scalability and surpasses the performance of DiT across various model scales on ImageNet at 256x256 resolution. Compared to state-of-the-art diffusion models on ImageNet 256x256 and 512x512, our largest model presents notable advantages, such as a reduction of up to 62% GFLOPs.
arXiv Detail & Related papers (2024-08-05T16:39:39Z)
SIGMA:Sinkhorn-Guided Masked Video Modeling [69.31715194419091]
Sinkhorn-guided Masked Video Modelling ( SIGMA) is a novel video pretraining method. We distribute features of space-time tubes evenly across a limited number of learnable clusters. Experimental results on ten datasets validate the effectiveness of SIGMA in learning more performant, temporally-aware, and robust video representations.
arXiv Detail & Related papers (2024-07-22T08:04:09Z)
Low-Resolution Self-Attention for Semantic Segmentation [96.81482872022237]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost. Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution. We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z)
Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance. We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring. Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z)
MLIC: Multi-Reference Entropy Model for Learned Image Compression [28.63380127598021]
We propose the Multi-Reference Entropy Model (MEM) and the advanced version, MEM$+$, to capture different types of correlations present in latent representation. Based on MEM and MEM$+$, we propose image compression models MLIC and MLIC$+$. Our MLIC and MLIC$+$ models achieve state-of-the-art performance, reducing BD-rate by $8.05%$ and $11.39%$ on the Kodak dataset compared to VTM-17.0 when measured in PSNR.
arXiv Detail & Related papers (2022-11-14T11:07:18Z)
GOLLIC: Learning Global Context beyond Patches for Lossless High-Resolution Image Compression [10.065286986365697]
We propose a hierarchical latent variable model with a global context to capture the long-term dependencies of high-resolution images. We show that our global context model improves compression ratio compared to the engineered codecs and deep learning models.
arXiv Detail & Related papers (2022-10-07T03:15:02Z)
Effective Invertible Arbitrary Image Rescaling [77.46732646918936]
Invertible Neural Networks (INN) are able to increase upscaling accuracy significantly by optimizing the downscaling and upscaling cycle jointly. A simple and effective invertible arbitrary rescaling network (IARN) is proposed to achieve arbitrary image rescaling by training only one model in this work. It is shown to achieve a state-of-the-art (SOTA) performance in bidirectional arbitrary rescaling without compromising perceptual quality in LR outputs.
arXiv Detail & Related papers (2022-09-26T22:22:30Z)
Joint Global and Local Hierarchical Priors for Learned Image Compression [30.44884350320053]
Recently, learned image compression methods have shown superior performance compared to the traditional hand-crafted image codecs. We propose a novel entropy model called Information Transformer (Informer) that exploits both local and global information in a content-dependent manner. Our experiments demonstrate that Informer improves rate-distortion performance over the state-of-the-art methods on the Kodak and Tecnick datasets.
arXiv Detail & Related papers (2021-12-08T06:17:37Z)
Learning Context-Based Non-local Entropy Modeling for Image Compression [140.64888994506313]
In this paper, we propose a non-local operation for context modeling by employing the global similarity within the context. The entropy model is further adopted as the rate loss in a joint rate-distortion optimization. Considering that the width of the transforms is essential in training low distortion models, we finally produce a U-Net block in the transforms to increase the width with manageable memory consumption and time complexity.
arXiv Detail & Related papers (2020-05-10T13:28:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.