Learned Image Compression with Mixed Transformer-CNN Architectures
- URL: http://arxiv.org/abs/2303.14978v1
- Date: Mon, 27 Mar 2023 08:19:01 GMT
- Title: Learned Image Compression with Mixed Transformer-CNN Architectures
- Authors: Jinming Liu, Heming Sun, Jiro Katto
- Abstract summary: We propose an efficient parallel Transformer-CNN Mixture (TCM) block with a controllable complexity.
Inspired by the recent progress of entropy estimation models and attention modules, we propose a channel-wise entropy model with parameter-efficient swin-transformer-based attention.
Experimental results demonstrate our proposed method achieves state-of-the-art rate-distortion performances.
- Score: 21.53261818914534
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learned image compression (LIC) methods have exhibited promising progress and
superior rate-distortion performance compared with classical image compression
standards. Most existing LIC methods are Convolutional Neural Networks-based
(CNN-based) or Transformer-based, which have different advantages. Exploiting
both advantages is a point worth exploring, which has two challenges: 1) how to
effectively fuse the two methods? 2) how to achieve higher performance with a
suitable complexity? In this paper, we propose an efficient parallel
Transformer-CNN Mixture (TCM) block with a controllable complexity to
incorporate the local modeling ability of CNN and the non-local modeling
ability of transformers to improve the overall architecture of image
compression models. Besides, inspired by the recent progress of entropy
estimation models and attention modules, we propose a channel-wise entropy
model with parameter-efficient swin-transformer-based attention (SWAtten)
modules by using channel squeezing. Experimental results demonstrate our
proposed method achieves state-of-the-art rate-distortion performances on three
different resolution datasets (i.e., Kodak, Tecnick, CLIC Professional
Validation) compared to existing LIC methods. The code is at
https://github.com/jmliu206/LIC_TCM.
Related papers
- Bi-Level Spatial and Channel-aware Transformer for Learned Image Compression [0.0]
We propose a novel Transformer-based image compression method that enhances the transformation stage by considering frequency components within the feature map.
Our method integrates a novel Hybrid Spatial-Channel Attention Transformer Block (HSCATB), where a spatial-based branch independently handles high and low frequencies.
We also introduce a Mixed Local-Global Feed Forward Network (MLGFFN) within the Transformer block to enhance the extraction of diverse and rich information.
arXiv Detail & Related papers (2024-08-07T15:35:25Z) - Efficient Contextformer: Spatio-Channel Window Attention for Fast
Context Modeling in Learned Image Compression [1.9249287163937978]
We introduce the Efficient Contextformer (eContextformer) - a transformer-based autoregressive context model for learned image.
It fuses patch-wise, checkered, and channel-wise grouping techniques for parallel context modeling.
It achieves 145x lower model complexity and 210Cx faster decoding speed, and higher average bit savings on Kodak, CLI, and Tecnick datasets.
arXiv Detail & Related papers (2023-06-25T16:29:51Z) - Exploring Effective Mask Sampling Modeling for Neural Image Compression [171.35596121939238]
Most existing neural image compression methods rely on side information from hyperprior or context models to eliminate spatial redundancy.
Inspired by the mask sampling modeling in recent self-supervised learning methods for natural language processing and high-level vision, we propose a novel pretraining strategy for neural image compression.
Our method achieves competitive performance with lower computational complexity compared to state-of-the-art image compression methods.
arXiv Detail & Related papers (2023-06-09T06:50:20Z) - Spatially-Adaptive Feature Modulation for Efficient Image
Super-Resolution [90.16462805389943]
We develop a spatially-adaptive feature modulation (SAFM) mechanism upon a vision transformer (ViT)-like block.
Proposed method is $3times$ smaller than state-of-the-art efficient SR methods.
arXiv Detail & Related papers (2023-02-27T14:19:31Z) - High-Fidelity Variable-Rate Image Compression via Invertible Activation
Transformation [24.379052026260034]
We propose the Invertible Activation Transformation (IAT) module to tackle the issue of high-fidelity fine variable-rate image compression.
IAT and QLevel together give the image compression model the ability of fine variable-rate control while better maintaining the image fidelity.
Our method outperforms the state-of-the-art variable-rate image compression method by a large margin, especially after multiple re-encodings.
arXiv Detail & Related papers (2022-09-12T07:14:07Z) - The Devil Is in the Details: Window-based Attention for Image
Compression [58.1577742463617]
Most existing learned image compression models are based on Convolutional Neural Networks (CNNs)
In this paper, we study the effects of multiple kinds of attention mechanisms for local features learning, then introduce a more straightforward yet effective window-based local attention block.
The proposed window-based attention is very flexible which could work as a plug-and-play component to enhance CNN and Transformer models.
arXiv Detail & Related papers (2022-03-16T07:55:49Z) - Neural Data-Dependent Transform for Learned Image Compression [72.86505042102155]
We build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image.
The experimental results show the effectiveness of the proposed neural-syntax design and the continuous online mode decision mechanism.
arXiv Detail & Related papers (2022-03-09T14:56:48Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - Learned Image Compression with Gaussian-Laplacian-Logistic Mixture Model
and Concatenated Residual Modules [22.818632387206257]
Two key components of learned image compression are the entropy model of the latent representations and the encoding/decoding network architectures.
We propose a more flexible discretized Gaussian-Laplacian-Logistic mixture model (GLLMM) for the latent representations.
In the encoding/decoding network design part, we propose a residual blocks (CRB) where multiple residual blocks are serially connected with additional shortcut connections.
arXiv Detail & Related papers (2021-07-14T02:54:22Z) - Learning End-to-End Lossy Image Compression: A Benchmark [90.35363142246806]
We first conduct a comprehensive literature survey of learned image compression methods.
We describe milestones in cutting-edge learned image-compression methods, review a broad range of existing works, and provide insights into their historical development routes.
By introducing a coarse-to-fine hyperprior model for entropy estimation and signal reconstruction, we achieve improved rate-distortion performance.
arXiv Detail & Related papers (2020-02-10T13:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.