Illumination Adaptive Transformer
- URL: http://arxiv.org/abs/2205.14871v1
- Date: Mon, 30 May 2022 06:21:52 GMT
- Title: Illumination Adaptive Transformer
- Authors: Ziteng Cui, Kunchang Li, Lin Gu, Shenghan Su, Peng Gao, Zhengkai
Jiang, Yu Qiao, Tatsuya Harada
- Abstract summary: We propose a lightweight fast Illumination Adaptive Transformer (IAT)
IAT decomposes the light transformation pipeline into local and global ISP components.
We have extensively evaluated IAT on multiple real-world datasets.
- Score: 66.50045722358503
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Challenging illumination conditions (low light, underexposure and
overexposure) in the real world not only cast an unpleasant visual appearance
but also taint the computer vision tasks. Existing light adaptive methods often
deal with each condition individually. What is more, most of them often operate
on a RAW image or over-simplify the camera image signal processing (ISP)
pipeline. By decomposing the light transformation pipeline into local and
global ISP components, we propose a lightweight fast Illumination Adaptive
Transformer (IAT) which comprises two transformer-style branches: local
estimation branch and global ISP branch. While the local branch estimates the
pixel-wise local components relevant to illumination, the global branch defines
learnable quires that attend the whole image to decode the parameters. Our IAT
could also conduct both object detection and semantic segmentation under
various light conditions. We have extensively evaluated IAT on multiple
real-world datasets on 2 low-level tasks and 3 high-level tasks. With only 90k
parameters and 0.004s processing speed (excluding high-level module), our IAT
has consistently achieved superior performance over SOTA. Code is available at
https://github.com/cuiziteng/IlluminationAdaptive-Transformer.
Related papers
- Depth-Wise Convolutions in Vision Transformers for Efficient Training on Small Datasets [11.95214938154427]
Vision Transformer (ViT) captures global information by dividing images into patches.
ViT lacks inductive bias during image or video dataset training.
We present a lightweight Depth-Wise Convolution module as a shortcut in ViT models.
arXiv Detail & Related papers (2024-07-28T04:23:40Z) - A Non-Uniform Low-Light Image Enhancement Method with Multi-Scale
Attention Transformer and Luminance Consistency Loss [11.585269110131659]
Low-light image enhancement aims to improve the perception of images collected in dim environments.
Existing methods cannot adaptively extract the differentiated luminance information, which will easily cause over-exposure and under-exposure.
We propose a multi-scale attention Transformer named MSATr, which sufficiently extracts local and global features for light balance to improve the visual quality.
arXiv Detail & Related papers (2023-12-27T10:07:11Z) - TransY-Net:Learning Fully Transformer Networks for Change Detection of
Remote Sensing Images [64.63004710817239]
We propose a novel Transformer-based learning framework named TransY-Net for remote sensing image CD.
It improves the feature extraction from a global view and combines multi-level visual features in a pyramid manner.
Our proposed method achieves a new state-of-the-art performance on four optical and two SAR image CD benchmarks.
arXiv Detail & Related papers (2023-10-22T07:42:19Z) - Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation [59.91357714415056]
We propose two Transformer variants: Context-Sharing Transformer (CST) and Semantic Gathering-Scattering Transformer (S GST)
CST learns the global-shared contextual information within image frames with a lightweight computation; S GST models the semantic correlation separately for the foreground and background.
Compared with the baseline that uses vanilla Transformers for multi-stage fusion, ours significantly increase the speed by 13 times and achieves new state-of-the-art ZVOS performance.
arXiv Detail & Related papers (2023-08-13T06:12:00Z) - Ultra-High-Definition Low-Light Image Enhancement: A Benchmark and
Transformer-Based Method [51.30748775681917]
We consider the task of low-light image enhancement (LLIE) and introduce a large-scale database consisting of images at 4K and 8K resolution.
We conduct systematic benchmarking studies and provide a comparison of current LLIE algorithms.
As a second contribution, we introduce LLFormer, a transformer-based low-light enhancement method.
arXiv Detail & Related papers (2022-12-22T09:05:07Z) - LRT: An Efficient Low-Light Restoration Transformer for Dark Light Field
Images [9.926231893220063]
Recent learning-based methods for low-light enhancement have their own disadvantages.
We propose an efficient Low-light Restoration Transformer (LRT) for LF images.
We show that our method can achieve superior performance on the restoration of extremely low-light and noisy LF images.
arXiv Detail & Related papers (2022-09-06T03:23:58Z) - Multitask AET with Orthogonal Tangent Regularity for Dark Object
Detection [84.52197307286681]
We propose a novel multitask auto encoding transformation (MAET) model to enhance object detection in a dark environment.
In a self-supervision manner, the MAET learns the intrinsic visual structure by encoding and decoding the realistic illumination-degrading transformation.
We have achieved the state-of-the-art performance using synthetic and real-world datasets.
arXiv Detail & Related papers (2022-05-06T16:27:14Z) - Detail-Preserving Transformer for Light Field Image Super-Resolution [15.53525700552796]
We put forth a novel formulation built upon Transformers, by treating light field super-resolution as a sequence-to-sequence reconstruction task.
We propose a detail-preserving Transformer (termed as DPT), by leveraging gradient maps of light field to guide the sequence learning.
DPT consists of two branches, with each associated with a Transformer for learning from an original or gradient image sequence.
arXiv Detail & Related papers (2022-01-02T12:33:23Z) - Vision Transformer with Progressive Sampling [73.60630716500154]
We propose an iterative and progressive sampling strategy to locate discriminative regions.
When trained from scratch on ImageNet, PS-ViT performs 3.8% higher than the vanilla ViT in terms of top-1 accuracy.
arXiv Detail & Related papers (2021-08-03T18:04:31Z) - Conditional Sequential Modulation for Efficient Global Image Retouching [45.99310982782054]
Photo retouching aims at enhancing the aesthetic visual quality of images that suffer from photographic defects such as over/under exposure, poor contrast, inharmonious saturation.
In this paper, we investigate some commonly-used retouching operations and mathematically find that these pixel-independent operations can be approximated or formulated by multi-layer perceptrons (MLPs)
We propose an extremely light-weight framework - Sequential Retouching Network (CSRNet) - for efficient global image retouching.
arXiv Detail & Related papers (2020-09-22T08:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.