Efficient Contextformer: Spatio-Channel Window Attention for Fast
Context Modeling in Learned Image Compression
- URL: http://arxiv.org/abs/2306.14287v2
- Date: Tue, 27 Feb 2024 14:01:23 GMT
- Title: Efficient Contextformer: Spatio-Channel Window Attention for Fast
Context Modeling in Learned Image Compression
- Authors: A. Burakhan Koyuncu, Panqi Jia, Atanas Boev, Elena Alshina, Eckehard
Steinbach
- Abstract summary: We introduce the Efficient Contextformer (eContextformer) - a transformer-based autoregressive context model for learned image.
It fuses patch-wise, checkered, and channel-wise grouping techniques for parallel context modeling.
It achieves 145x lower model complexity and 210Cx faster decoding speed, and higher average bit savings on Kodak, CLI, and Tecnick datasets.
- Score: 1.9249287163937978
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Entropy estimation is essential for the performance of learned image
compression. It has been demonstrated that a transformer-based entropy model is
of critical importance for achieving a high compression ratio, however, at the
expense of a significant computational effort. In this work, we introduce the
Efficient Contextformer (eContextformer) - a computationally efficient
transformer-based autoregressive context model for learned image compression.
The eContextformer efficiently fuses the patch-wise, checkered, and
channel-wise grouping techniques for parallel context modeling, and introduces
a shifted window spatio-channel attention mechanism. We explore better training
strategies and architectural designs and introduce additional complexity
optimizations. During decoding, the proposed optimization techniques
dynamically scale the attention span and cache the previous attention
computations, drastically reducing the model and runtime complexity. Compared
to the non-parallel approach, our proposal has ~145x lower model complexity and
~210x faster decoding speed, and achieves higher average bit savings on Kodak,
CLIC2020, and Tecnick datasets. Additionally, the low complexity of our context
model enables online rate-distortion algorithms, which further improve the
compression performance. We achieve up to 17% bitrate savings over the intra
coding of Versatile Video Coding (VVC) Test Model (VTM) 16.2 and surpass
various learning-based compression models.
Related papers
- Corner-to-Center Long-range Context Model for Efficient Learned Image
Compression [70.0411436929495]
In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations.
We propose the textbfCorner-to-Center transformer-based Context Model (C$3$M) designed to enhance context and latent predictions.
In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder.
arXiv Detail & Related papers (2023-11-29T21:40:28Z) - Progressive Learning with Visual Prompt Tuning for Variable-Rate Image
Compression [60.689646881479064]
We propose a progressive learning paradigm for transformer-based variable-rate image compression.
Inspired by visual prompt tuning, we use LPM to extract prompts for input images and hidden features at the encoder side and decoder side, respectively.
Our model outperforms all current variable image methods in terms of rate-distortion performance and approaches the state-of-the-art fixed image compression methods trained from scratch.
arXiv Detail & Related papers (2023-11-23T08:29:32Z) - ELIC: Efficient Learned Image Compression with Unevenly Grouped
Space-Channel Contextual Adaptive Coding [9.908820641439368]
We propose an efficient model, ELIC, to achieve state-of-the-art speed and compression ability.
With superior performance, the proposed model also supports extremely fast preview decoding and progressive decoding.
arXiv Detail & Related papers (2022-03-21T11:19:50Z) - Neural Data-Dependent Transform for Learned Image Compression [72.86505042102155]
We build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image.
The experimental results show the effectiveness of the proposed neural-syntax design and the continuous online mode decision mechanism.
arXiv Detail & Related papers (2022-03-09T14:56:48Z) - Contextformer: A Transformer with Spatio-Channel Attention for Context
Modeling in Learned Image Compression [5.152019611975467]
We propose a transformer-based context model ak.a. Contextformer.
We replace the context model of a modern compression framework with the Contextformer and test it on the widely used Kodak image dataset.
Our experimental results show that the proposed model provides up to 10% rate savings compared to the standard Versatile Video Coding (VVC) Test Model (VVC) 9.1.
arXiv Detail & Related papers (2022-03-04T17:29:32Z) - Entroformer: A Transformer-based Entropy Model for Learned Image
Compression [17.51693464943102]
We propose a novel transformer-based entropy model, termed Entroformer, to capture long-range dependencies in probability distribution estimation.
The experiments show that the Entroformer achieves state-of-the-art performance on image compression while being time-efficient.
arXiv Detail & Related papers (2022-02-11T08:03:31Z) - Learning True Rate-Distortion-Optimization for End-To-End Image
Compression [59.816251613869376]
Rate-distortion optimization is crucial part of traditional image and video compression.
In this paper, we enhance the training by introducing low-complexity estimations of the RDO result into the training.
We achieve average rate savings of 19.6% in MS-SSIM over the previous RDONet model, which equals rate savings of 27.3% over a comparable conventional deep image coder.
arXiv Detail & Related papers (2022-01-05T13:02:00Z) - Conditional Entropy Coding for Efficient Video Compression [82.35389813794372]
We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames.
We first show that a simple architecture modeling the entropy between the image latent codes is as competitive as other neural video compression works and video codecs.
We then propose a novel internal learning extension on top of this architecture that brings an additional 10% savings without trading off decoding speed.
arXiv Detail & Related papers (2020-08-20T20:01:59Z) - Channel-wise Autoregressive Entropy Models for Learned Image Compression [8.486483425885291]
In learning-based approaches to image compression, codecs are developed by optimizing a computational model to minimize a rate-distortion objective.
We introduce two enhancements, channel-conditioning and latent residual prediction, that lead to network architectures with better rate-distortion performance.
At low bit rates, where the improvements are most effective, our model saves up to 18% over the baseline and outperforms hand-engineered codecs like BPG by up to 25%.
arXiv Detail & Related papers (2020-07-17T03:33:53Z) - Learning End-to-End Lossy Image Compression: A Benchmark [90.35363142246806]
We first conduct a comprehensive literature survey of learned image compression methods.
We describe milestones in cutting-edge learned image-compression methods, review a broad range of existing works, and provide insights into their historical development routes.
By introducing a coarse-to-fine hyperprior model for entropy estimation and signal reconstruction, we achieve improved rate-distortion performance.
arXiv Detail & Related papers (2020-02-10T13:13:43Z) - A Unified End-to-End Framework for Efficient Deep Image Compression [35.156677716140635]
We propose a unified framework called Efficient Deep Image Compression (EDIC) based on three new technologies.
Specifically, we design an auto-encoder style network for learning based image compression.
Our EDIC method can also be readily incorporated with the Deep Video Compression (DVC) framework to further improve the video compression performance.
arXiv Detail & Related papers (2020-02-09T14:21:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.