Hybrid Local-Global Transformer for Image Dehazing
- URL: http://arxiv.org/abs/2109.07100v1
- Date: Wed, 15 Sep 2021 06:13:22 GMT
- Title: Hybrid Local-Global Transformer for Image Dehazing
- Authors: Dong Zhao, Jia Li, Hongyu Li, and Long Xu
- Abstract summary: Vision Transformer (ViT) has shown impressive performance on high-level and low-level vision tasks.
We propose a new ViT architecture, named Hybrid Local-Global Vision Transformer (HyLoG-ViT), for single image dehazing.
- Score: 18.468149424220424
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, the Vision Transformer (ViT) has shown impressive performance on
high-level and low-level vision tasks. In this paper, we propose a new ViT
architecture, named Hybrid Local-Global Vision Transformer (HyLoG-ViT), for
single image dehazing. The HyLoG-ViT block consists of two paths, the local ViT
path and the global ViT path, which are used to capture local and global
dependencies. The hybrid features are fused via convolution layers. As a
result, the HyLoG-ViT reduces the computational complexity and introduces
locality in the networks. Then, the HyLoG-ViT blocks are incorporated within
our dehazing networks, which jointly learn the intrinsic image decomposition
and image dehazing. Specifically, the network consists of one shared encoder
and three decoders for reflectance prediction, shading prediction, and
haze-free image generation. The tasks of reflectance and shading prediction can
produce meaningful intermediate features that can serve as complementary
features for haze-free image generation. To effectively aggregate the
complementary features, we propose a complementary features selection module
(CFSM) to select the useful ones for image dehazing. Extensive experiments on
homogeneous, non-homogeneous, and nighttime dehazing tasks reveal that our
proposed Transformer-based dehazing network can achieve comparable or even
better performance than CNNs-based dehazing models.
Related papers
- HAT: Hybrid Attention Transformer for Image Restoration [61.74223315807691]
Transformer-based methods have shown impressive performance in image restoration tasks, such as image super-resolution and denoising.
We propose a new Hybrid Attention Transformer (HAT) to activate more input pixels for better restoration.
Our HAT achieves state-of-the-art performance both quantitatively and qualitatively.
arXiv Detail & Related papers (2023-09-11T05:17:55Z) - HEAL-SWIN: A Vision Transformer On The Sphere [4.379414115481346]
High-resolution wide-angle fisheye images are becoming more important for robotics applications such as autonomous driving.
We introduce the HEAL-SWIN transformer, which combines the highly uniform Hierarchical Equal Area iso-Latitude Pixelation grid used in astrophysics and cosmology.
In HEAL-SWIN, the nested structure of the HEALPix grid is used to perform the patching and windowing operations of the SWIN transformer, enabling the network to process spherical representations with minimal computational overhead.
arXiv Detail & Related papers (2023-07-14T12:46:59Z) - TcGAN: Semantic-Aware and Structure-Preserved GANs with Individual
Vision Transformer for Fast Arbitrary One-Shot Image Generation [11.207512995742999]
One-shot image generation (OSG) with generative adversarial networks that learn from the internal patches of a given image has attracted world wide attention.
We propose a novel structure-preserved method TcGAN with individual vision transformer to overcome the shortcomings of the existing one-shot image generation methods.
arXiv Detail & Related papers (2023-02-16T03:05:59Z) - DLGSANet: Lightweight Dynamic Local and Global Self-Attention Networks
for Image Super-Resolution [83.47467223117361]
We propose an effective lightweight dynamic local and global self-attention network (DLGSANet) to solve image super-resolution.
Motivated by the network designs of Transformers, we develop a simple yet effective multi-head dynamic local self-attention (MHDLSA) module to extract local features efficiently.
To overcome this problem, we develop a sparse global self-attention (SparseGSA) module to select the most useful similarity values.
arXiv Detail & Related papers (2023-01-05T12:06:47Z) - Efficient Image Super-Resolution with Feature Interaction Weighted Hybrid Network [101.53907377000445]
Lightweight image super-resolution aims to reconstruct high-resolution images from low-resolution images using low computational costs.
Existing methods result in the loss of middle-layer features due to activation functions.
We propose a Feature Interaction Weighted Hybrid Network (FIWHN) to minimize the impact of intermediate feature loss on reconstruction quality.
arXiv Detail & Related papers (2022-12-29T05:57:29Z) - Towards Lightweight Transformer via Group-wise Transformation for
Vision-and-Language Tasks [126.33843752332139]
We introduce Group-wise Transformation towards a universal yet lightweight Transformer for vision-and-language tasks, termed as LW-Transformer.
We apply LW-Transformer to a set of Transformer-based networks, and quantitatively measure them on three vision-and-language tasks and six benchmark datasets.
Experimental results show that while saving a large number of parameters and computations, LW-Transformer achieves very competitive performance against the original Transformer networks for vision-and-language tasks.
arXiv Detail & Related papers (2022-04-16T11:30:26Z) - Vision Transformer with Convolutions Architecture Search [72.70461709267497]
We propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS)
The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture.
It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
arXiv Detail & Related papers (2022-03-20T02:59:51Z) - ViTGAN: Training GANs with Vision Transformers [46.769407314698434]
Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases.
We introduce several novel regularization techniques for training GANs with ViTs.
Our approach, named ViTGAN, achieves comparable performance to the leading CNN-based GAN models on three datasets.
arXiv Detail & Related papers (2021-07-09T17:59:30Z) - Intriguing Properties of Vision Transformers [114.28522466830374]
Vision transformers (ViT) have demonstrated impressive performance across various machine vision problems.
We systematically study this question via an extensive set of experiments and comparisons with a high-performing convolutional neural network (CNN)
We show effective features of ViTs are due to flexible receptive and dynamic fields possible via the self-attention mechanism.
arXiv Detail & Related papers (2021-05-21T17:59:18Z) - LT-GAN: Self-Supervised GAN with Latent Transformation Detection [10.405721171353195]
We propose a self-supervised approach (LT-GAN) to improve the generation quality and diversity of images.
We experimentally demonstrate that our proposed LT-GAN can be effectively combined with other state-of-the-art training techniques for added benefits.
arXiv Detail & Related papers (2020-10-19T22:09:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.