Dual-former: Hybrid Self-attention Transformer for Efficient Image
Restoration
- URL: http://arxiv.org/abs/2210.01069v1
- Date: Mon, 3 Oct 2022 16:39:21 GMT
- Title: Dual-former: Hybrid Self-attention Transformer for Efficient Image
Restoration
- Authors: Sixiang Chen, Tian Ye, Yun Liu, Erkang Chen
- Abstract summary: We present Dual-former, which combines the powerful global modeling ability of self-attention modules and the local modeling ability of convolutions in an overall architecture.
Experiments demonstrate that Dual-former achieves a 1.91dB gain over the state-of-the-art MAXIM method on the Indoor dataset for single image dehazing.
For single image deraining, it exceeds the SOTA method by 0.1dB PSNR on the average results of five datasets with only 21.5% GFLOPs.
- Score: 6.611849560359801
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, image restoration transformers have achieved comparable performance
with previous state-of-the-art CNNs. However, how to efficiently leverage such
architectures remains an open problem. In this work, we present Dual-former
whose critical insight is to combine the powerful global modeling ability of
self-attention modules and the local modeling ability of convolutions in an
overall architecture. With convolution-based Local Feature Extraction modules
equipped in the encoder and the decoder, we only adopt a novel Hybrid
Transformer Block in the latent layer to model the long-distance dependence in
spatial dimensions and handle the uneven distribution between channels. Such a
design eliminates the substantial computational complexity in previous image
restoration transformers and achieves superior performance on multiple image
restoration tasks. Experiments demonstrate that Dual-former achieves a 1.91dB
gain over the state-of-the-art MAXIM method on the Indoor dataset for single
image dehazing while consuming only 4.2% GFLOPs as MAXIM. For single image
deraining, it exceeds the SOTA method by 0.1dB PSNR on the average results of
five datasets with only 21.5% GFLOPs. Dual-former also substantially surpasses
the latest desnowing method on various datasets, with fewer parameters.
Related papers
- Lightweight single-image super-resolution network based on dual paths [0.552480439325792]
Single image super-resolution(SISR) algorithms under deep learning currently have two main models, one based on convolutional neural networks and the other based on Transformer.
This paper proposes a new lightweight multi-scale feature fusion network model based on two-way complementary convolutional and Transformer.
arXiv Detail & Related papers (2024-09-10T15:31:37Z) - Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models [26.926712014346432]
This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization.
Our method's efficacy is demonstrated on the class-conditional ImageNet generation benchmark, setting new state-of-the-art FID scores of 1.70 on ImageNet 256 x 256 and 2.89 on ImageNet 512 x 512.
arXiv Detail & Related papers (2024-06-13T17:59:58Z) - Binarized Diffusion Model for Image Super-Resolution [61.963833405167875]
Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating advanced diffusion models (DMs)
Existing binarization methods result in significant performance degradation.
We introduce a novel binarized diffusion model, BI-DiffSR, for image SR.
arXiv Detail & Related papers (2024-06-09T10:30:25Z) - Dual-Path Multi-Scale Transformer for High-Quality Image Deraining [1.7104836047593197]
We propose a dual-path multi-scale Transformer (DPMformer) for high-quality image reconstruction.
This method consists of a backbone path and two branch paths from two different multi-scale approaches.
Our method achieves promising performance compared to other state-of-the-art methods.
arXiv Detail & Related papers (2024-05-28T12:31:23Z) - A-SDM: Accelerating Stable Diffusion through Redundancy Removal and
Performance Optimization [54.113083217869516]
In this work, we first explore the computational redundancy part of the network.
We then prune the redundancy blocks of the model and maintain the network performance.
Thirdly, we propose a global-regional interactive (GRI) attention to speed up the computationally intensive attention part.
arXiv Detail & Related papers (2023-12-24T15:37:47Z) - HAT: Hybrid Attention Transformer for Image Restoration [61.74223315807691]
Transformer-based methods have shown impressive performance in image restoration tasks, such as image super-resolution and denoising.
We propose a new Hybrid Attention Transformer (HAT) to activate more input pixels for better restoration.
Our HAT achieves state-of-the-art performance both quantitatively and qualitatively.
arXiv Detail & Related papers (2023-09-11T05:17:55Z) - Magic ELF: Image Deraining Meets Association Learning and Transformer [63.761812092934576]
This paper aims to unify CNN and Transformer to take advantage of their learning merits for image deraining.
A novel multi-input attention module (MAM) is proposed to associate rain removal and background recovery.
Our proposed method (dubbed as ELF) outperforms the state-of-the-art approach (MPRNet) by 0.25 dB on average.
arXiv Detail & Related papers (2022-07-21T12:50:54Z) - HUMUS-Net: Hybrid unrolled multi-scale network architecture for
accelerated MRI reconstruction [38.0542877099235]
HUMUS-Net is a hybrid architecture that combines the beneficial implicit bias and efficiency of convolutions with the power of Transformer blocks in an unrolled and multi-scale network.
Our network establishes new state of the art on the largest publicly available MRI dataset, the fastMRI dataset.
arXiv Detail & Related papers (2022-03-15T19:26:29Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - Improved Transformer for High-Resolution GANs [69.42469272015481]
We introduce two key ingredients to Transformer to address this challenge.
We show in the experiments that the proposed HiT achieves state-of-the-art FID scores of 31.87 and 2.95 on unconditional ImageNet $128 times 128$ and FFHQ $256 times 256$, respectively.
arXiv Detail & Related papers (2021-06-14T17:39:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.