Contextformer: A Transformer with Spatio-Channel Attention for Context
Modeling in Learned Image Compression
- URL: http://arxiv.org/abs/2203.02452v1
- Date: Fri, 4 Mar 2022 17:29:32 GMT
- Title: Contextformer: A Transformer with Spatio-Channel Attention for Context
Modeling in Learned Image Compression
- Authors: A. Burakhan Koyuncu, Han Gao, Eckehard Steinbach
- Abstract summary: We propose a transformer-based context model ak.a. Contextformer.
We replace the context model of a modern compression framework with the Contextformer and test it on the widely used Kodak image dataset.
Our experimental results show that the proposed model provides up to 10% rate savings compared to the standard Versatile Video Coding (VVC) Test Model (VVC) 9.1.
- Score: 5.152019611975467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Entropy modeling is a key component for high-performance image compression
algorithms. Recent developments in autoregressive context modeling helped
learning-based methods to surpass their classical counterparts. However, the
performance of those models can be further improved due to the underexploited
spatio-channel dependencies in latent space, and the suboptimal implementation
of context adaptivity. Inspired by the adaptive characteristics of the
transformers, we propose a transformer-based context model, a.k.a.
Contextformer, which generalizes the de facto standard attention mechanism to
spatio-channel attention. We replace the context model of a modern compression
framework with the Contextformer and test it on the widely used Kodak image
dataset. Our experimental results show that the proposed model provides up to
10% rate savings compared to the standard Versatile Video Coding (VVC) Test
Model (VTM) 9.1, and outperforms various learning-based models.
Related papers
- Corner-to-Center Long-range Context Model for Efficient Learned Image
Compression [70.0411436929495]
In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations.
We propose the textbfCorner-to-Center transformer-based Context Model (C$3$M) designed to enhance context and latent predictions.
In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder.
arXiv Detail & Related papers (2023-11-29T21:40:28Z) - Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image
Compression [63.56922682378755]
We focus on extending spatial aggregation capability and propose a dynamic kernel-based transform coding.
The proposed adaptive aggregation generates kernel offsets to capture valid information in the content-conditioned range to help transform.
Experimental results demonstrate that our method achieves superior rate-distortion performance on three benchmarks compared to the state-of-the-art learning-based methods.
arXiv Detail & Related papers (2023-08-17T01:34:51Z) - Efficient Contextformer: Spatio-Channel Window Attention for Fast
Context Modeling in Learned Image Compression [1.9249287163937978]
We introduce the Efficient Contextformer (eContextformer) - a transformer-based autoregressive context model for learned image.
It fuses patch-wise, checkered, and channel-wise grouping techniques for parallel context modeling.
It achieves 145x lower model complexity and 210Cx faster decoding speed, and higher average bit savings on Kodak, CLI, and Tecnick datasets.
arXiv Detail & Related papers (2023-06-25T16:29:51Z) - Semantic Image Synthesis with Semantically Coupled VQ-Model [42.19799555533789]
We conditionally synthesize the latent space from a vector quantized model (VQ-model) pre-trained to autoencode images.
We show that our model improves semantic image synthesis using autoregressive models on popular semantic image datasets ADE20k, Cityscapes and COCO-Stuff.
arXiv Detail & Related papers (2022-09-06T14:37:01Z) - Neural Data-Dependent Transform for Learned Image Compression [72.86505042102155]
We build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image.
The experimental results show the effectiveness of the proposed neural-syntax design and the continuous online mode decision mechanism.
arXiv Detail & Related papers (2022-03-09T14:56:48Z) - Insights from Generative Modeling for Neural Video Compression [31.59496634465347]
We present newly proposed neural video coding algorithms through the lens of deep autoregressive and latent variable modeling.
We propose several architectures that yield state-of-the-art video compression performance on high-resolution video.
We provide further evidence that the generative modeling viewpoint can advance the neural video coding field.
arXiv Detail & Related papers (2021-07-28T02:19:39Z) - Channel-wise Autoregressive Entropy Models for Learned Image Compression [8.486483425885291]
In learning-based approaches to image compression, codecs are developed by optimizing a computational model to minimize a rate-distortion objective.
We introduce two enhancements, channel-conditioning and latent residual prediction, that lead to network architectures with better rate-distortion performance.
At low bit rates, where the improvements are most effective, our model saves up to 18% over the baseline and outperforms hand-engineered codecs like BPG by up to 25%.
arXiv Detail & Related papers (2020-07-17T03:33:53Z) - Learning Context-Based Non-local Entropy Modeling for Image Compression [140.64888994506313]
In this paper, we propose a non-local operation for context modeling by employing the global similarity within the context.
The entropy model is further adopted as the rate loss in a joint rate-distortion optimization.
Considering that the width of the transforms is essential in training low distortion models, we finally produce a U-Net block in the transforms to increase the width with manageable memory consumption and time complexity.
arXiv Detail & Related papers (2020-05-10T13:28:18Z) - Normalizing Flows with Multi-Scale Autoregressive Priors [131.895570212956]
We introduce channel-wise dependencies in their latent space through multi-scale autoregressive priors (mAR)
Our mAR prior for models with split coupling flow layers (mAR-SCF) can better capture dependencies in complex multimodal data.
We show that mAR-SCF allows for improved image generation quality, with gains in FID and Inception scores compared to state-of-the-art flow-based models.
arXiv Detail & Related papers (2020-04-08T09:07:11Z) - Learning End-to-End Lossy Image Compression: A Benchmark [90.35363142246806]
We first conduct a comprehensive literature survey of learned image compression methods.
We describe milestones in cutting-edge learned image-compression methods, review a broad range of existing works, and provide insights into their historical development routes.
By introducing a coarse-to-fine hyperprior model for entropy estimation and signal reconstruction, we achieve improved rate-distortion performance.
arXiv Detail & Related papers (2020-02-10T13:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.