Towards End-to-End Image Compression and Analysis with Transformers
- URL: http://arxiv.org/abs/2112.09300v1
- Date: Fri, 17 Dec 2021 03:28:14 GMT
- Title: Towards End-to-End Image Compression and Analysis with Transformers
- Authors: Yuanchao Bai, Xu Yang, Xianming Liu, Junjun Jiang, Yaowei Wang,
Xiangyang Ji, Wen Gao
- Abstract summary: We propose an end-to-end image compression and analysis model with Transformers, targeting to the cloud-based image classification application.
We aim to redesign the Vision Transformer (ViT) model to perform image classification from the compressed features and facilitate image compression with the long-term information from the Transformer.
Experimental results demonstrate the effectiveness of the proposed model in both the image compression and the classification tasks.
- Score: 99.50111380056043
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an end-to-end image compression and analysis model with
Transformers, targeting to the cloud-based image classification application.
Instead of placing an existing Transformer-based image classification model
directly after an image codec, we aim to redesign the Vision Transformer (ViT)
model to perform image classification from the compressed features and
facilitate image compression with the long-term information from the
Transformer. Specifically, we first replace the patchify stem (i.e., image
splitting and embedding) of the ViT model with a lightweight image encoder
modelled by a convolutional neural network. The compressed features generated
by the image encoder are injected convolutional inductive bias and are fed to
the Transformer for image classification bypassing image reconstruction.
Meanwhile, we propose a feature aggregation module to fuse the compressed
features with the selected intermediate features of the Transformer, and feed
the aggregated features to a deconvolutional neural network for image
reconstruction. The aggregated features can obtain the long-term information
from the self-attention mechanism of the Transformer and improve the
compression performance. The rate-distortion-accuracy optimization problem is
finally solved by a two-step training strategy. Experimental results
demonstrate the effectiveness of the proposed model in both the image
compression and the classification tasks.
Related papers
- Progressive Learning with Visual Prompt Tuning for Variable-Rate Image
Compression [60.689646881479064]
We propose a progressive learning paradigm for transformer-based variable-rate image compression.
Inspired by visual prompt tuning, we use LPM to extract prompts for input images and hidden features at the encoder side and decoder side, respectively.
Our model outperforms all current variable image methods in terms of rate-distortion performance and approaches the state-of-the-art fixed image compression methods trained from scratch.
arXiv Detail & Related papers (2023-11-23T08:29:32Z) - AICT: An Adaptive Image Compression Transformer [18.05997169440533]
We propose a more straightforward yet effective Tranformer-based channel-wise auto-regressive prior model, resulting in an absolute image compression transformer (ICT)
The proposed ICT can capture both global and local contexts from the latent representations.
We leverage a learnable scaling module with a sandwich ConvNeXt-based pre/post-processor to accurately extract more compact latent representation.
arXiv Detail & Related papers (2023-07-12T11:32:02Z) - Image Deblurring by Exploring In-depth Properties of Transformer [86.7039249037193]
We leverage deep features extracted from a pretrained vision transformer (ViT) to encourage recovered images to be sharp without sacrificing the performance measured by the quantitative metrics.
By comparing the transformer features between recovered image and target one, the pretrained transformer provides high-resolution blur-sensitive semantic information.
One regards the features as vectors and computes the discrepancy between representations extracted from recovered image and target one in Euclidean space.
arXiv Detail & Related papers (2023-03-24T14:14:25Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - Transformer-based Image Compression [18.976159633970177]
Transformer-based Image Compression (TIC) approach is developed which reuses the canonical variational autoencoder (VAE) architecture with paired main and hyper encoder-decoders.
TIC rivals with state-of-the-art approaches including deep convolutional neural networks (CNNs) based learnt image coding (LIC) methods and handcrafted rules-based intra profile of recently-approved Versatile Video Coding (VVC) standard.
arXiv Detail & Related papers (2021-11-12T13:13:20Z) - PPT Fusion: Pyramid Patch Transformerfor a Case Study in Image Fusion [37.993611194758195]
We propose a Patch PyramidTransformer(PPT) to address the issues of extracting semantic information from an image.
The experimental results demonstrate its superior performance against the state-of-the-art fusion approaches.
arXiv Detail & Related papers (2021-07-29T13:57:45Z) - Transformer-Based Deep Image Matching for Generalizable Person
Re-identification [114.56752624945142]
We investigate the possibility of applying Transformers for image matching and metric learning given pairs of images.
We find that the Vision Transformer (ViT) and the vanilla Transformer with decoders are not adequate for image matching due to their lack of image-to-image attention.
We propose a new simplified decoder, which drops the full attention implementation with the softmax weighting, keeping only the query-key similarity.
arXiv Detail & Related papers (2021-05-30T05:38:33Z) - Modeling Lost Information in Lossy Image Compression [72.69327382643549]
Lossy image compression is one of the most commonly used operators for digital images.
We propose a novel invertible framework called Invertible Lossy Compression (ILC) to largely mitigate the information loss problem.
arXiv Detail & Related papers (2020-06-22T04:04:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.