Bi-Level Spatial and Channel-aware Transformer for Learned Image Compression
- URL: http://arxiv.org/abs/2408.03842v1
- Date: Wed, 7 Aug 2024 15:35:25 GMT
- Title: Bi-Level Spatial and Channel-aware Transformer for Learned Image Compression
- Authors: Hamidreza Soltani, Erfan Ghasemi,
- Abstract summary: We propose a novel Transformer-based image compression method that enhances the transformation stage by considering frequency components within the feature map.
Our method integrates a novel Hybrid Spatial-Channel Attention Transformer Block (HSCATB), where a spatial-based branch independently handles high and low frequencies.
We also introduce a Mixed Local-Global Feed Forward Network (MLGFFN) within the Transformer block to enhance the extraction of diverse and rich information.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in learned image compression (LIC) methods have demonstrated superior performance over traditional hand-crafted codecs. These learning-based methods often employ convolutional neural networks (CNNs) or Transformer-based architectures. However, these nonlinear approaches frequently overlook the frequency characteristics of images, which limits their compression efficiency. To address this issue, we propose a novel Transformer-based image compression method that enhances the transformation stage by considering frequency components within the feature map. Our method integrates a novel Hybrid Spatial-Channel Attention Transformer Block (HSCATB), where a spatial-based branch independently handles high and low frequencies at the attention layer, and a Channel-aware Self-Attention (CaSA) module captures information across channels, significantly improving compression performance. Additionally, we introduce a Mixed Local-Global Feed Forward Network (MLGFFN) within the Transformer block to enhance the extraction of diverse and rich information, which is crucial for effective compression. These innovations collectively improve the transformation's ability to project data into a more decorrelated latent space, thereby boosting overall compression efficiency. Experimental results demonstrate that our framework surpasses state-of-the-art LIC methods in rate-distortion performance.
Related papers
- Enhancing Learned Image Compression via Cross Window-based Attention [4.673285689826945]
We propose a CNN-based solution integrated with a feature encoding module.
Cross-scale window-based attention is inspired by the attention mechanism in transformers and effectively enlarges the receptive field.
We evaluate our method on the Kodak and CLIC datasets and demonstrate that our approach is effective and on par with state-of-the-art methods.
arXiv Detail & Related papers (2024-10-28T15:44:35Z) - Progressive Learning with Visual Prompt Tuning for Variable-Rate Image
Compression [60.689646881479064]
We propose a progressive learning paradigm for transformer-based variable-rate image compression.
Inspired by visual prompt tuning, we use LPM to extract prompts for input images and hidden features at the encoder side and decoder side, respectively.
Our model outperforms all current variable image methods in terms of rate-distortion performance and approaches the state-of-the-art fixed image compression methods trained from scratch.
arXiv Detail & Related papers (2023-11-23T08:29:32Z) - Frequency-Aware Transformer for Learned Image Compression [64.28698450919647]
We propose a frequency-aware transformer (FAT) block that for the first time achieves multiscale directional ananlysis for Learned Image Compression (LIC)
The FAT block comprises frequency-decomposition window attention (FDWA) modules to capture multiscale and directional frequency components of natural images.
We also introduce frequency-modulation feed-forward network (FMFFN) to adaptively modulate different frequency components, improving rate-distortion performance.
arXiv Detail & Related papers (2023-10-25T05:59:25Z) - AICT: An Adaptive Image Compression Transformer [18.05997169440533]
We propose a more straightforward yet effective Tranformer-based channel-wise auto-regressive prior model, resulting in an absolute image compression transformer (ICT)
The proposed ICT can capture both global and local contexts from the latent representations.
We leverage a learnable scaling module with a sandwich ConvNeXt-based pre/post-processor to accurately extract more compact latent representation.
arXiv Detail & Related papers (2023-07-12T11:32:02Z) - Learned Image Compression with Mixed Transformer-CNN Architectures [21.53261818914534]
We propose an efficient parallel Transformer-CNN Mixture (TCM) block with a controllable complexity.
Inspired by the recent progress of entropy estimation models and attention modules, we propose a channel-wise entropy model with parameter-efficient swin-transformer-based attention.
Experimental results demonstrate our proposed method achieves state-of-the-art rate-distortion performances.
arXiv Detail & Related papers (2023-03-27T08:19:01Z) - Learned Video Compression via Heterogeneous Deformable Compensation
Network [78.72508633457392]
We propose a learned video compression framework via heterogeneous deformable compensation strategy (HDCVC) to tackle the problems of unstable compression performance.
More specifically, the proposed algorithm extracts features from the two adjacent frames to estimate content-Neighborhood heterogeneous deformable (HetDeform) kernel offsets.
Experimental results indicate that HDCVC achieves superior performance than the recent state-of-the-art learned video compression approaches.
arXiv Detail & Related papers (2022-07-11T02:31:31Z) - The Devil Is in the Details: Window-based Attention for Image
Compression [58.1577742463617]
Most existing learned image compression models are based on Convolutional Neural Networks (CNNs)
In this paper, we study the effects of multiple kinds of attention mechanisms for local features learning, then introduce a more straightforward yet effective window-based local attention block.
The proposed window-based attention is very flexible which could work as a plug-and-play component to enhance CNN and Transformer models.
arXiv Detail & Related papers (2022-03-16T07:55:49Z) - Neural JPEG: End-to-End Image Compression Leveraging a Standard JPEG
Encoder-Decoder [73.48927855855219]
We propose a system that learns to improve the encoding performance by enhancing its internal neural representations on both the encoder and decoder ends.
Experiments demonstrate that our approach successfully improves the rate-distortion performance over JPEG across various quality metrics.
arXiv Detail & Related papers (2022-01-27T20:20:03Z) - Towards End-to-End Image Compression and Analysis with Transformers [99.50111380056043]
We propose an end-to-end image compression and analysis model with Transformers, targeting to the cloud-based image classification application.
We aim to redesign the Vision Transformer (ViT) model to perform image classification from the compressed features and facilitate image compression with the long-term information from the Transformer.
Experimental results demonstrate the effectiveness of the proposed model in both the image compression and the classification tasks.
arXiv Detail & Related papers (2021-12-17T03:28:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.