Transformer Compressed Sensing via Global Image Tokens
- URL: http://arxiv.org/abs/2203.12861v2
- Date: Sun, 27 Mar 2022 06:02:35 GMT
- Title: Transformer Compressed Sensing via Global Image Tokens
- Authors: Marlon Bran Lorenzana, Craig Engstrom, Feng Liu and Shekhar S. Chandra
- Abstract summary: We propose a novel image decomposition that naturally embeds images into low-resolution inputs.
We replace CNN components in a well-known CS-MRI neural network with TNN blocks and demonstrate the improvements afforded by KD.
- Score: 4.722333456749269
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional neural networks (CNN) have demonstrated outstanding Compressed
Sensing (CS) performance compared to traditional, hand-crafted methods.
However, they are broadly limited in terms of generalisability, inductive bias
and difficulty to model long distance relationships. Transformer neural
networks (TNN) overcome such issues by implementing an attention mechanism
designed to capture dependencies between inputs. However, high-resolution tasks
typically require vision Transformers (ViT) to decompose an image into
patch-based tokens, limiting inputs to inherently local contexts. We propose a
novel image decomposition that naturally embeds images into low-resolution
inputs. These Kaleidoscope tokens (KD) provide a mechanism for global
attention, at the same computational cost as a patch-based approach. To
showcase this development, we replace CNN components in a well-known CS-MRI
neural network with TNN blocks and demonstrate the improvements afforded by KD.
We also propose an ensemble of image tokens, which enhance overall image
quality and reduces model size. Supplementary material is available:
https://github.com/uqmarlonbran/TCS.git
Related papers
- Enhancing Learned Image Compression via Cross Window-based Attention [4.673285689826945]
We propose a CNN-based solution integrated with a feature encoding module.
Cross-scale window-based attention is inspired by the attention mechanism in transformers and effectively enlarges the receptive field.
We evaluate our method on the Kodak and CLIC datasets and demonstrate that our approach is effective and on par with state-of-the-art methods.
arXiv Detail & Related papers (2024-10-28T15:44:35Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - T-former: An Efficient Transformer for Image Inpainting [50.43302925662507]
A class of attention-based network architectures, called transformer, has shown significant performance on natural language processing fields.
In this paper, we design a novel attention linearly related to the resolution according to Taylor expansion, and based on this attention, a network called $T$-former is designed for image inpainting.
Experiments on several benchmark datasets demonstrate that our proposed method achieves state-of-the-art accuracy while maintaining a relatively low number of parameters and computational complexity.
arXiv Detail & Related papers (2023-05-12T04:10:42Z) - Accurate Image Restoration with Attention Retractable Transformer [50.05204240159985]
We propose Attention Retractable Transformer (ART) for image restoration.
ART presents both dense and sparse attention modules in the network.
We conduct extensive experiments on image super-resolution, denoising, and JPEG compression artifact reduction tasks.
arXiv Detail & Related papers (2022-10-04T07:35:01Z) - The Devil Is in the Details: Window-based Attention for Image
Compression [58.1577742463617]
Most existing learned image compression models are based on Convolutional Neural Networks (CNNs)
In this paper, we study the effects of multiple kinds of attention mechanisms for local features learning, then introduce a more straightforward yet effective window-based local attention block.
The proposed window-based attention is very flexible which could work as a plug-and-play component to enhance CNN and Transformer models.
arXiv Detail & Related papers (2022-03-16T07:55:49Z) - Joint Global and Local Hierarchical Priors for Learned Image Compression [30.44884350320053]
Recently, learned image compression methods have shown superior performance compared to the traditional hand-crafted image codecs.
We propose a novel entropy model called Information Transformer (Informer) that exploits both local and global information in a content-dependent manner.
Our experiments demonstrate that Informer improves rate-distortion performance over the state-of-the-art methods on the Kodak and Tecnick datasets.
arXiv Detail & Related papers (2021-12-08T06:17:37Z) - Restormer: Efficient Transformer for High-Resolution Image Restoration [118.9617735769827]
convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data.
Transformers have shown significant performance gains on natural language and high-level vision tasks.
Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks.
arXiv Detail & Related papers (2021-11-18T18:59:10Z) - Less is More: Pay Less Attention in Vision Transformers [61.05787583247392]
Less attention vIsion Transformer builds upon the fact that convolutions, fully-connected layers, and self-attentions have almost equivalent mathematical expressions for processing image patch sequences.
The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation.
arXiv Detail & Related papers (2021-05-29T05:26:07Z) - KVT: k-NN Attention for Boosting Vision Transformers [44.189475770152185]
We propose a sparse attention scheme, dubbed k-NN attention, for boosting vision transformers.
The proposed k-NN attention naturally inherits the local bias of CNNs without introducing convolutional operations.
We verify, both theoretically and empirically, that $k$-NN attention is powerful in distilling noise from input tokens and in speeding up training.
arXiv Detail & Related papers (2021-05-28T06:49:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.