Transformer-based Variable-rate Image Compression with
Region-of-interest Control
- URL: http://arxiv.org/abs/2305.10807v3
- Date: Tue, 1 Aug 2023 10:12:42 GMT
- Title: Transformer-based Variable-rate Image Compression with
Region-of-interest Control
- Authors: Chia-Hao Kao, Ying-Chieh Weng, Yi-Hsin Chen, Wei-Chen Chiu, Wen-Hsiao
Peng
- Abstract summary: This paper proposes a transformer-based learned image compression system.
It is capable of achieving variable-rate compression with a single model while supporting the region-of-interest functionality.
- Score: 24.794581811606445
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper proposes a transformer-based learned image compression system. It
is capable of achieving variable-rate compression with a single model while
supporting the region-of-interest (ROI) functionality. Inspired by prompt
tuning, we introduce prompt generation networks to condition the
transformer-based autoencoder of compression. Our prompt generation networks
generate content-adaptive tokens according to the input image, an ROI mask, and
a rate parameter. The separation of the ROI mask and the rate parameter allows
an intuitive way to achieve variable-rate and ROI coding simultaneously.
Extensive experiments validate the effectiveness of our proposed method and
confirm its superiority over the other competing methods.
Related papers
- Bi-Level Spatial and Channel-aware Transformer for Learned Image Compression [0.0]
We propose a novel Transformer-based image compression method that enhances the transformation stage by considering frequency components within the feature map.
Our method integrates a novel Hybrid Spatial-Channel Attention Transformer Block (HSCATB), where a spatial-based branch independently handles high and low frequencies.
We also introduce a Mixed Local-Global Feed Forward Network (MLGFFN) within the Transformer block to enhance the extraction of diverse and rich information.
arXiv Detail & Related papers (2024-08-07T15:35:25Z) - Enhancing Perception Quality in Remote Sensing Image Compression via Invertible Neural Network [10.427300958330816]
Decoding remote sensing images to achieve high perceptual quality, particularly at lows, remains a significant challenge.
We propose the invertible neural network-based remote sensing image compression (INN-RSIC) method.
Our INN-RSIC significantly outperforms the existing state-of-the-art traditional and deep learning-based image compression methods in terms of perception quality.
arXiv Detail & Related papers (2024-05-17T03:52:37Z) - A Lightweight Sparse Focus Transformer for Remote Sensing Image Change Captioning [11.93705794906543]
This paper proposes a Sparse Focus Transformer (SFT) for the remote sensing image change captioning (RSICC) task.
The proposed SFT network can reduce the parameter number and computational complexity by incorporating a sparse attention mechanism.
arXiv Detail & Related papers (2024-05-10T16:56:53Z) - Progressive Learning with Visual Prompt Tuning for Variable-Rate Image
Compression [60.689646881479064]
We propose a progressive learning paradigm for transformer-based variable-rate image compression.
Inspired by visual prompt tuning, we use LPM to extract prompts for input images and hidden features at the encoder side and decoder side, respectively.
Our model outperforms all current variable image methods in terms of rate-distortion performance and approaches the state-of-the-art fixed image compression methods trained from scratch.
arXiv Detail & Related papers (2023-11-23T08:29:32Z) - Patch Is Not All You Need [57.290256181083016]
We propose a novel Pattern Transformer to adaptively convert images to pattern sequences for Transformer input.
We employ the Convolutional Neural Network to extract various patterns from the input image.
We have accomplished state-of-the-art performance on CIFAR-10 and CIFAR-100, and have achieved competitive results on ImageNet.
arXiv Detail & Related papers (2023-08-21T13:54:00Z) - AICT: An Adaptive Image Compression Transformer [18.05997169440533]
We propose a more straightforward yet effective Tranformer-based channel-wise auto-regressive prior model, resulting in an absolute image compression transformer (ICT)
The proposed ICT can capture both global and local contexts from the latent representations.
We leverage a learnable scaling module with a sandwich ConvNeXt-based pre/post-processor to accurately extract more compact latent representation.
arXiv Detail & Related papers (2023-07-12T11:32:02Z) - The Devil Is in the Details: Window-based Attention for Image
Compression [58.1577742463617]
Most existing learned image compression models are based on Convolutional Neural Networks (CNNs)
In this paper, we study the effects of multiple kinds of attention mechanisms for local features learning, then introduce a more straightforward yet effective window-based local attention block.
The proposed window-based attention is very flexible which could work as a plug-and-play component to enhance CNN and Transformer models.
arXiv Detail & Related papers (2022-03-16T07:55:49Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - Towards End-to-End Image Compression and Analysis with Transformers [99.50111380056043]
We propose an end-to-end image compression and analysis model with Transformers, targeting to the cloud-based image classification application.
We aim to redesign the Vision Transformer (ViT) model to perform image classification from the compressed features and facilitate image compression with the long-term information from the Transformer.
Experimental results demonstrate the effectiveness of the proposed model in both the image compression and the classification tasks.
arXiv Detail & Related papers (2021-12-17T03:28:14Z) - Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD)
It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches.
Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.