Related papers: Attention-based Image Upsampling

Attention-based Image Upsampling

URL: http://arxiv.org/abs/2012.09904v1
Date: Thu, 17 Dec 2020 19:58:10 GMT
Title: Attention-based Image Upsampling
Authors: Souvik Kundu, Hesham Mostafa, Sharath Nittur Sridhar, Sairam Sundaresan
Abstract summary: We show how attention mechanisms can be used to replace another canonical operation: strided transposed convolution. We show that attention-based upsampling consistently outperforms traditional upsampling methods.
Score: 14.676228848773157
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Convolutional layers are an integral part of many deep neural network solutions in computer vision. Recent work shows that replacing the standard convolution operation with mechanisms based on self-attention leads to improved performance on image classification and object detection tasks. In this work, we show how attention mechanisms can be used to replace another canonical operation: strided transposed convolution. We term our novel attention-based operation attention-based upsampling since it increases/upsamples the spatial dimensions of the feature maps. Through experiments on single image super-resolution and joint-image upsampling tasks, we show that attention-based upsampling consistently outperforms traditional upsampling methods based on strided transposed convolution or based on adaptive filters while using fewer parameters. We show that the inherent flexibility of the attention mechanism, which allows it to use separate sources for calculating the attention coefficients and the attention targets, makes attention-based upsampling a natural choice when fusing information from multiple image modalities.

Related papers

Exploring Kernel Transformations for Implicit Neural Representations [57.2225355625268]
Implicit neural representations (INRs) leverage neural networks to represent signals by mapping coordinates to their corresponding attributes. This work pioneers the exploration of the effect of kernel transformation of input/output while keeping the model itself unchanged. A byproduct of our findings is a simple yet effective method that combines scale and shift to significantly boost INR with negligible overhead.
arXiv Detail & Related papers (2025-04-07T04:43:50Z)
Wavelet-based Bi-dimensional Aggregation Network for SAR Image Change Detection [53.842568573251214]
Experimental results on three SAR datasets demonstrate that our WBANet significantly outperforms contemporary state-of-the-art methods. Our WBANet achieves 98.33%, 96.65%, and 96.62% of percentage of correct classification (PCC) on the respective datasets.
arXiv Detail & Related papers (2024-07-18T04:36:10Z)
AMSA-UNet: An Asymmetric Multiple Scales U-net Based on Self-attention for Deblurring [7.00986132499006]
asymmetric multiple scales U-net based on self-attention (AMSA-UNet) is proposed to improve the accuracy and computational complexity. By introducing a multiple-scales U shape architecture, the network can focus on blurry regions at the global level and better recover image details at the local level.
arXiv Detail & Related papers (2024-06-13T11:39:02Z)
RefDrop: Controllable Consistency in Image or Video Generation via Reference Feature Guidance [22.326405355520176]
RefDrop allows users to control the influence of reference context in a direct and precise manner. Our method also enables more interesting applications, such as the consistent generation of multiple subjects.
arXiv Detail & Related papers (2024-05-27T21:23:20Z)
Pixel-Inconsistency Modeling for Image Manipulation Localization [59.968362815126326]
Digital image forensics plays a crucial role in image authentication and manipulation localization. This paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts. Experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints.
arXiv Detail & Related papers (2023-09-30T02:54:51Z)
ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution [76.7408734079706]
Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation. We propose ESSAformer, an ESSA attention-embedded Transformer network for single-HSI-SR with an iterative refining structure.
arXiv Detail & Related papers (2023-07-26T07:45:14Z)
Deep Convolutional Pooling Transformer for Deepfake Detection [54.10864860009834]
We propose a deep convolutional Transformer to incorporate decisive image features both locally and globally. Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy. The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.
arXiv Detail & Related papers (2022-09-12T15:05:41Z)
ASSET: Autoregressive Semantic Scene Editing with Transformers at High Resolutions [28.956280590967808]
Our architecture is based on a transformer with a novel attention mechanism. Our key idea is to sparsify the transformer's attention matrix at high resolutions, guided by dense attention extracted at lower image resolutions. We present qualitative and quantitative results, along with user studies, demonstrating the effectiveness of our method.
arXiv Detail & Related papers (2022-05-24T17:39:53Z)
Correlation-Aware Deep Tracking [83.51092789908677]
We propose a novel target-dependent feature network inspired by the self-/cross-attention scheme. Our network deeply embeds cross-image feature correlation in multiple layers of the feature network. Our model can be flexibly pre-trained on abundant unpaired images, leading to notably faster convergence than the existing methods.
arXiv Detail & Related papers (2022-03-03T11:53:54Z)
Augmented Equivariant Attention Networks for Microscopy Image Reconstruction [44.965820245167635]
It is time-consuming and expensive to take high-quality or high-resolution electron microscopy (EM) and fluorescence microscopy (FM) images. Deep learning enables us to perform image-to-image transformation tasks for various types of microscopy image reconstruction. We propose the augmented equivariant attention networks (AEANets) with better capability to capture inter-image dependencies.
arXiv Detail & Related papers (2020-11-06T23:37:49Z)
Image super-resolution reconstruction based on attention mechanism and feature fusion [3.42658286826597]
A network structure based on attention mechanism and multi-scale feature fusion is proposed. Experimental results show that the proposed method can achieve better performance over other representative super-resolution reconstruction algorithms.
arXiv Detail & Related papers (2020-04-08T11:20:10Z)
ADRN: Attention-based Deep Residual Network for Hyperspectral Image Denoising [52.01041506447195]
We propose an attention-based deep residual network to learn a mapping from noisy HSI to the clean one. Experimental results demonstrate that our proposed ADRN scheme outperforms the state-of-the-art methods both in quantitative and visual evaluations.
arXiv Detail & Related papers (2020-03-04T08:36:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.