The Devil Is in the Details: Window-based Attention for Image
Compression
- URL: http://arxiv.org/abs/2203.08450v1
- Date: Wed, 16 Mar 2022 07:55:49 GMT
- Title: The Devil Is in the Details: Window-based Attention for Image
Compression
- Authors: Renjie Zou, Chunfeng Song, Zhaoxiang Zhang
- Abstract summary: Most existing learned image compression models are based on Convolutional Neural Networks (CNNs)
In this paper, we study the effects of multiple kinds of attention mechanisms for local features learning, then introduce a more straightforward yet effective window-based local attention block.
The proposed window-based attention is very flexible which could work as a plug-and-play component to enhance CNN and Transformer models.
- Score: 58.1577742463617
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learned image compression methods have exhibited superior rate-distortion
performance than classical image compression standards. Most existing learned
image compression models are based on Convolutional Neural Networks (CNNs).
Despite great contributions, a main drawback of CNN based model is that its
structure is not designed for capturing local redundancy, especially the
non-repetitive textures, which severely affects the reconstruction quality.
Therefore, how to make full use of both global structure and local texture
becomes the core problem for learning-based image compression. Inspired by
recent progresses of Vision Transformer (ViT) and Swin Transformer, we found
that combining the local-aware attention mechanism with the global-related
feature learning could meet the expectation in image compression. In this
paper, we first extensively study the effects of multiple kinds of attention
mechanisms for local features learning, then introduce a more straightforward
yet effective window-based local attention block. The proposed window-based
attention is very flexible which could work as a plug-and-play component to
enhance CNN and Transformer models. Moreover, we propose a novel Symmetrical
TransFormer (STF) framework with absolute transformer blocks in the
down-sampling encoder and up-sampling decoder. Extensive experimental
evaluations have shown that the proposed method is effective and outperforms
the state-of-the-art methods. The code is publicly available at
https://github.com/Googolxx/STF.
Related papers
- Enhancing Learned Image Compression via Cross Window-based Attention [4.673285689826945]
We propose a CNN-based solution integrated with a feature encoding module.
Cross-scale window-based attention is inspired by the attention mechanism in transformers and effectively enlarges the receptive field.
We evaluate our method on the Kodak and CLIC datasets and demonstrate that our approach is effective and on par with state-of-the-art methods.
arXiv Detail & Related papers (2024-10-28T15:44:35Z) - Exploiting Inter-Image Similarity Prior for Low-Bitrate Remote Sensing Image Compression [10.427300958330816]
We propose a codebook-based RS image compression (Code-RSIC) method with a generated discrete codebook.
The code significantly outperforms state-of-the-art traditional and learning-based image compression algorithms in terms of perception quality.
arXiv Detail & Related papers (2024-07-17T03:33:16Z) - Transferable Learned Image Compression-Resistant Adversarial Perturbations [66.46470251521947]
Adversarial attacks can readily disrupt the image classification system, revealing the vulnerability of DNN-based recognition tasks.
We introduce a new pipeline that targets image classification models that utilize learned image compressors as pre-processing modules.
arXiv Detail & Related papers (2024-01-06T03:03:28Z) - Image Compression using only Attention based Neural Networks [13.126014437648612]
We introduce the concept of learned image queries to aggregate patch information via cross-attention, followed by quantization and coding techniques.
Our work demonstrates competitive performance achieved by convolution-free architectures across the popular Kodak, DIV2K, and CLIC datasets.
arXiv Detail & Related papers (2023-10-17T13:38:38Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Accurate Image Restoration with Attention Retractable Transformer [50.05204240159985]
We propose Attention Retractable Transformer (ART) for image restoration.
ART presents both dense and sparse attention modules in the network.
We conduct extensive experiments on image super-resolution, denoising, and JPEG compression artifact reduction tasks.
arXiv Detail & Related papers (2022-10-04T07:35:01Z) - Towards End-to-End Image Compression and Analysis with Transformers [99.50111380056043]
We propose an end-to-end image compression and analysis model with Transformers, targeting to the cloud-based image classification application.
We aim to redesign the Vision Transformer (ViT) model to perform image classification from the compressed features and facilitate image compression with the long-term information from the Transformer.
Experimental results demonstrate the effectiveness of the proposed model in both the image compression and the classification tasks.
arXiv Detail & Related papers (2021-12-17T03:28:14Z) - Implicit Neural Representations for Image Compression [103.78615661013623]
Implicit Neural Representations (INRs) have gained attention as a novel and effective representation for various data types.
We propose the first comprehensive compression pipeline based on INRs including quantization, quantization-aware retraining and entropy coding.
We find that our approach to source compression with INRs vastly outperforms similar prior work.
arXiv Detail & Related papers (2021-12-08T13:02:53Z) - Joint Global and Local Hierarchical Priors for Learned Image Compression [30.44884350320053]
Recently, learned image compression methods have shown superior performance compared to the traditional hand-crafted image codecs.
We propose a novel entropy model called Information Transformer (Informer) that exploits both local and global information in a content-dependent manner.
Our experiments demonstrate that Informer improves rate-distortion performance over the state-of-the-art methods on the Kodak and Tecnick datasets.
arXiv Detail & Related papers (2021-12-08T06:17:37Z) - Enhanced Invertible Encoding for Learned Image Compression [40.21904131503064]
In this paper, we propose an enhanced Invertible.
Network with invertible neural networks (INNs) to largely mitigate the information loss problem for better compression.
Experimental results on the Kodak, CLIC, and Tecnick datasets show that our method outperforms the existing learned image compression methods.
arXiv Detail & Related papers (2021-08-08T17:32:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.