Attention-based Image Upsampling
- URL: http://arxiv.org/abs/2012.09904v1
- Date: Thu, 17 Dec 2020 19:58:10 GMT
- Title: Attention-based Image Upsampling
- Authors: Souvik Kundu, Hesham Mostafa, Sharath Nittur Sridhar, Sairam
Sundaresan
- Abstract summary: We show how attention mechanisms can be used to replace another canonical operation: strided transposed convolution.
We show that attention-based upsampling consistently outperforms traditional upsampling methods.
- Score: 14.676228848773157
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Convolutional layers are an integral part of many deep neural network
solutions in computer vision. Recent work shows that replacing the standard
convolution operation with mechanisms based on self-attention leads to improved
performance on image classification and object detection tasks. In this work,
we show how attention mechanisms can be used to replace another canonical
operation: strided transposed convolution. We term our novel attention-based
operation attention-based upsampling since it increases/upsamples the spatial
dimensions of the feature maps. Through experiments on single image
super-resolution and joint-image upsampling tasks, we show that attention-based
upsampling consistently outperforms traditional upsampling methods based on
strided transposed convolution or based on adaptive filters while using fewer
parameters. We show that the inherent flexibility of the attention mechanism,
which allows it to use separate sources for calculating the attention
coefficients and the attention targets, makes attention-based upsampling a
natural choice when fusing information from multiple image modalities.
Related papers
- Wavelet-based Bi-dimensional Aggregation Network for SAR Image Change Detection [53.842568573251214]
Experimental results on three SAR datasets demonstrate that our WBANet significantly outperforms contemporary state-of-the-art methods.
Our WBANet achieves 98.33%, 96.65%, and 96.62% of percentage of correct classification (PCC) on the respective datasets.
arXiv Detail & Related papers (2024-07-18T04:36:10Z) - AMSA-UNet: An Asymmetric Multiple Scales U-net Based on Self-attention for Deblurring [7.00986132499006]
asymmetric multiple scales U-net based on self-attention (AMSA-UNet) is proposed to improve the accuracy and computational complexity.
By introducing a multiple-scales U shape architecture, the network can focus on blurry regions at the global level and better recover image details at the local level.
arXiv Detail & Related papers (2024-06-13T11:39:02Z) - RefDrop: Controllable Consistency in Image or Video Generation via Reference Feature Guidance [22.326405355520176]
RefDrop allows users to control the influence of reference context in a direct and precise manner.
Our method also enables more interesting applications, such as the consistent generation of multiple subjects.
arXiv Detail & Related papers (2024-05-27T21:23:20Z) - Pixel-Inconsistency Modeling for Image Manipulation Localization [59.968362815126326]
Digital image forensics plays a crucial role in image authentication and manipulation localization.
This paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts.
Experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints.
arXiv Detail & Related papers (2023-09-30T02:54:51Z) - ESSAformer: Efficient Transformer for Hyperspectral Image
Super-resolution [76.7408734079706]
Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation.
We propose ESSAformer, an ESSA attention-embedded Transformer network for single-HSI-SR with an iterative refining structure.
arXiv Detail & Related papers (2023-07-26T07:45:14Z) - Deep Convolutional Pooling Transformer for Deepfake Detection [54.10864860009834]
We propose a deep convolutional Transformer to incorporate decisive image features both locally and globally.
Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy.
The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.
arXiv Detail & Related papers (2022-09-12T15:05:41Z) - ASSET: Autoregressive Semantic Scene Editing with Transformers at High
Resolutions [28.956280590967808]
Our architecture is based on a transformer with a novel attention mechanism.
Our key idea is to sparsify the transformer's attention matrix at high resolutions, guided by dense attention extracted at lower image resolutions.
We present qualitative and quantitative results, along with user studies, demonstrating the effectiveness of our method.
arXiv Detail & Related papers (2022-05-24T17:39:53Z) - Correlation-Aware Deep Tracking [83.51092789908677]
We propose a novel target-dependent feature network inspired by the self-/cross-attention scheme.
Our network deeply embeds cross-image feature correlation in multiple layers of the feature network.
Our model can be flexibly pre-trained on abundant unpaired images, leading to notably faster convergence than the existing methods.
arXiv Detail & Related papers (2022-03-03T11:53:54Z) - Augmented Equivariant Attention Networks for Microscopy Image
Reconstruction [44.965820245167635]
It is time-consuming and expensive to take high-quality or high-resolution electron microscopy (EM) and fluorescence microscopy (FM) images.
Deep learning enables us to perform image-to-image transformation tasks for various types of microscopy image reconstruction.
We propose the augmented equivariant attention networks (AEANets) with better capability to capture inter-image dependencies.
arXiv Detail & Related papers (2020-11-06T23:37:49Z) - Image super-resolution reconstruction based on attention mechanism and
feature fusion [3.42658286826597]
A network structure based on attention mechanism and multi-scale feature fusion is proposed.
Experimental results show that the proposed method can achieve better performance over other representative super-resolution reconstruction algorithms.
arXiv Detail & Related papers (2020-04-08T11:20:10Z) - ADRN: Attention-based Deep Residual Network for Hyperspectral Image
Denoising [52.01041506447195]
We propose an attention-based deep residual network to learn a mapping from noisy HSI to the clean one.
Experimental results demonstrate that our proposed ADRN scheme outperforms the state-of-the-art methods both in quantitative and visual evaluations.
arXiv Detail & Related papers (2020-03-04T08:36:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.