Corner-to-Center Long-range Context Model for Efficient Learned Image
Compression
- URL: http://arxiv.org/abs/2311.18103v1
- Date: Wed, 29 Nov 2023 21:40:28 GMT
- Title: Corner-to-Center Long-range Context Model for Efficient Learned Image
Compression
- Authors: Yang Sui, Ding Ding, Xiang Pan, Xiaozhong Xu, Shan Liu, Bo Yuan,
Zhenzhong Chen
- Abstract summary: In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations.
We propose the textbfCorner-to-Center transformer-based Context Model (C$3$M) designed to enhance context and latent predictions.
In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder.
- Score: 70.0411436929495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the framework of learned image compression, the context model plays a
pivotal role in capturing the dependencies among latent representations. To
reduce the decoding time resulting from the serial autoregressive context
model, the parallel context model has been proposed as an alternative that
necessitates only two passes during the decoding phase, thus facilitating
efficient image compression in real-world scenarios. However, performance
degradation occurs due to its incomplete casual context. To tackle this issue,
we conduct an in-depth analysis of the performance degradation observed in
existing parallel context models, focusing on two aspects: the Quantity and
Quality of information utilized for context prediction and decoding. Based on
such analysis, we propose the \textbf{Corner-to-Center transformer-based
Context Model (C$^3$M)} designed to enhance context and latent predictions and
improve rate-distortion performance. Specifically, we leverage the
logarithmic-based prediction order to predict more context features from corner
to center progressively. In addition, to enlarge the receptive field in the
analysis and synthesis transformation, we use the Long-range Crossing Attention
Module (LCAM) in the encoder/decoder to capture the long-range semantic
information by assigning the different window shapes in different channels.
Extensive experimental evaluations show that the proposed method is effective
and outperforms the state-of-the-art parallel methods. Finally, according to
the subjective analysis, we suggest that improving the detailed representation
in transformer-based image compression is a promising direction to be explored.
Related papers
- Joint Hierarchical Priors and Adaptive Spatial Resolution for Efficient
Neural Image Compression [11.25130799452367]
We propose an absolute image compression transformer (ICT) for neural image compression (NIC)
ICT captures both global and local contexts from the latent representations and better parameterize the distribution of the quantized latents.
Our framework significantly improves the trade-off between coding efficiency and decoder complexity over the versatile video coding (VVC) reference encoder (VTM-18.0) and the neural SwinT-ChARM.
arXiv Detail & Related papers (2023-07-05T13:17:14Z) - Efficient Contextformer: Spatio-Channel Window Attention for Fast
Context Modeling in Learned Image Compression [1.9249287163937978]
We introduce the Efficient Contextformer (eContextformer) - a transformer-based autoregressive context model for learned image.
It fuses patch-wise, checkered, and channel-wise grouping techniques for parallel context modeling.
It achieves 145x lower model complexity and 210Cx faster decoding speed, and higher average bit savings on Kodak, CLI, and Tecnick datasets.
arXiv Detail & Related papers (2023-06-25T16:29:51Z) - Improving Diffusion-based Image Translation using Asymmetric Gradient
Guidance [51.188396199083336]
We present an approach that guides the reverse process of diffusion sampling by applying asymmetric gradient guidance.
Our model's adaptability allows it to be implemented with both image-fusion and latent-dif models.
Experiments show that our method outperforms various state-of-the-art models in image translation tasks.
arXiv Detail & Related papers (2023-06-07T12:56:56Z) - Lossy Image Compression with Conditional Diffusion Models [25.158390422252097]
This paper outlines an end-to-end optimized lossy image compression framework using diffusion generative models.
In contrast to VAE-based neural compression, where the (mean) decoder is a deterministic neural network, our decoder is a conditional diffusion model.
Our approach yields stronger reported FID scores than the GAN-based model, while also yielding competitive performance with VAE-based models in several distortion metrics.
arXiv Detail & Related papers (2022-09-14T21:53:27Z) - DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation [56.514462874501675]
We propose a dynamic sparse attention based Transformer model to achieve fine-level matching with favorable efficiency.
The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on.
Experiments on three applications, pose-guided person image generation, edge-based face synthesis, and undistorted image style transfer, demonstrate that DynaST achieves superior performance in local details.
arXiv Detail & Related papers (2022-07-13T11:12:03Z) - Neural Data-Dependent Transform for Learned Image Compression [72.86505042102155]
We build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image.
The experimental results show the effectiveness of the proposed neural-syntax design and the continuous online mode decision mechanism.
arXiv Detail & Related papers (2022-03-09T14:56:48Z) - Contextformer: A Transformer with Spatio-Channel Attention for Context
Modeling in Learned Image Compression [5.152019611975467]
We propose a transformer-based context model ak.a. Contextformer.
We replace the context model of a modern compression framework with the Contextformer and test it on the widely used Kodak image dataset.
Our experimental results show that the proposed model provides up to 10% rate savings compared to the standard Versatile Video Coding (VVC) Test Model (VVC) 9.1.
arXiv Detail & Related papers (2022-03-04T17:29:32Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - Causal Contextual Prediction for Learned Image Compression [36.08393281509613]
We propose the concept of separate entropy coding to leverage a serial decoding process for causal contextual entropy prediction in the latent space.
A causal context model is proposed that separates the latents across channels and makes use of cross-channel relationships to generate highly informative contexts.
We also propose a causal global prediction model, which is able to find global reference points for accurate predictions of unknown points.
arXiv Detail & Related papers (2020-11-19T08:15:10Z) - Learning Context-Based Non-local Entropy Modeling for Image Compression [140.64888994506313]
In this paper, we propose a non-local operation for context modeling by employing the global similarity within the context.
The entropy model is further adopted as the rate loss in a joint rate-distortion optimization.
Considering that the width of the transforms is essential in training low distortion models, we finally produce a U-Net block in the transforms to increase the width with manageable memory consumption and time complexity.
arXiv Detail & Related papers (2020-05-10T13:28:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.