Related papers: LIPT: Latency-aware Image Processing Transformer

LIPT: Latency-aware Image Processing Transformer

URL: http://arxiv.org/abs/2404.06075v2
Date: Mon, 29 Apr 2024 03:21:49 GMT
Title: LIPT: Latency-aware Image Processing Transformer
Authors: Junbo Qiao, Wei Li, Haizhen Xie, Hanting Chen, Yunshuai Zhou, Zhijun Tu, Jie Hu, Shaohui Lin,
Abstract summary: We present a latency-aware image processing transformer, termed LIPT. We devise the low-latency proportion LIPT block that substitutes memory-intensive operators with the combination of self-attention and convolutions to achieve practical speedup.
Score: 17.802838753201385
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformer is leading a trend in the field of image processing. Despite the great success that existing lightweight image processing transformers have achieved, they are tailored to FLOPs or parameters reduction, rather than practical inference acceleration. In this paper, we present a latency-aware image processing transformer, termed LIPT. We devise the low-latency proportion LIPT block that substitutes memory-intensive operators with the combination of self-attention and convolutions to achieve practical speedup. Specifically, we propose a novel non-volatile sparse masking self-attention (NVSM-SA) that utilizes a pre-computing sparse mask to capture contextual information from a larger window with no extra computation overload. Besides, a high-frequency reparameterization module (HRM) is proposed to make LIPT block reparameterization friendly, which improves the model's detail reconstruction capability. Extensive experiments on multiple image processing tasks (e.g., image super-resolution (SR), JPEG artifact reduction, and image denoising) demonstrate the superiority of LIPT on both latency and PSNR. LIPT achieves real-time GPU inference with state-of-the-art performance on multiple image SR benchmarks.

Related papers

TransiT: Transient Transformer for Non-line-of-sight Videography [28.571430723113117]
We propose a new Transient Transformer architecture called TransiT to achieve real-time NLOS recovery under fast scans. TransiT directly compresses the temporal dimension of input transients to extract features, reducing computation costs and meeting high frame rate requirements. In real experiments, TransiT manages to reconstruct from sparse transients of $16 times 16$ measured at an exposure time of 0.4 ms per point to NLOS videos at a $64 times 64$ resolution at 10 frames per second.
arXiv Detail & Related papers (2025-03-14T11:56:37Z)
PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution [87.89013794655207]
Diffusion-based image super-resolution (SR) models have shown superior performance at the cost of multiple denoising steps. We propose a novel post-training quantization approach with adaptive scale in one-step diffusion (OSD) image SR, PassionSR. Our PassionSR achieves significant advantages over recent leading low-bit quantization methods for image SR.
arXiv Detail & Related papers (2024-11-26T04:49:42Z)
Efficient Visual State Space Model for Image Deblurring [83.57239834238035]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration. We propose a simple yet effective visual state space model (EVSSM) for image deblurring.
arXiv Detail & Related papers (2024-05-23T09:13:36Z)
Look-Around Before You Leap: High-Frequency Injected Transformer for Image Restoration [46.96362010335177]
In this paper, we propose HIT, a simple yet effective High-frequency Injected Transformer for image restoration. Specifically, we design a window-wise injection module (WIM), which incorporates abundant high-frequency details into the feature map, to provide reliable references for restoring high-quality images. In addition, we introduce a spatial enhancement unit (SEU) to preserve essential spatial relationships that may be lost due to the computations carried out across channel dimensions in the BIM.
arXiv Detail & Related papers (2024-03-30T08:05:00Z)
A Cost-Efficient FPGA Implementation of Tiny Transformer Model using Neural ODE [0.8403582577557918]
Transformer has been adopted to image recognition tasks and shown to outperform CNNs and RNNs while it suffers from high training cost and computational complexity. We propose a lightweight hybrid model which uses Neural ODE as a backbone instead of ResNet. The proposed model is deployed on a modest-sized FPGA device for edge computing.
arXiv Detail & Related papers (2024-01-05T09:32:39Z)
HAT: Hybrid Attention Transformer for Image Restoration [61.74223315807691]
Transformer-based methods have shown impressive performance in image restoration tasks, such as image super-resolution and denoising. We propose a new Hybrid Attention Transformer (HAT) to activate more input pixels for better restoration. Our HAT achieves state-of-the-art performance both quantitatively and qualitatively.
arXiv Detail & Related papers (2023-09-11T05:17:55Z)
ITSRN++: Stronger and Better Implicit Transformer Network for Continuous Screen Content Image Super-Resolution [32.441761727608856]
The proposed method achieves state-of-the-art performance for SCI SR (outperforming SwinIR by 0.74 dB for x3 SR) and also works well for natural image SR. We construct a large scale SCI2K dataset to facilitate the research on SCI SR.
arXiv Detail & Related papers (2022-10-17T07:47:34Z)
Hybrid Pixel-Unshuffled Network for Lightweight Image Super-Resolution [64.54162195322246]
Convolutional neural network (CNN) has achieved great success on image super-resolution (SR) Most deep CNN-based SR models take massive computations to obtain high performance. We propose a novel Hybrid Pixel-Unshuffled Network (HPUN) by introducing an efficient and effective downsampling module into the SR task.
arXiv Detail & Related papers (2022-03-16T20:10:41Z)
Efficient Transformer for Single Image Super-Resolution [13.234199307504602]
We propose a novel Efficient Super-Resolution Transformer (ESRT) for fast and accurate image super-resolution. ESRT is a hybrid Transformer where a CNN-based SR network is first designed in the front to extract deep features. The proposed ET only occupies 4191M GPU memory with better performance.
arXiv Detail & Related papers (2021-08-25T07:05:30Z)
High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network [23.981019687483506]
We focus on speeding-up the high-resolution photorealistic I2IT tasks based on closed-form Laplacian pyramid decomposition and reconstruction. We propose a Laplacian Pyramid Translation Network (N) to simultaneously perform these two tasks. Our model avoids most of the heavy computation consumed by processing high-resolution feature maps and faithfully preserves the image details.
arXiv Detail & Related papers (2021-05-19T15:05:22Z)
Scalable Visual Transformers with Hierarchical Pooling [61.05787583247392]
We propose a Hierarchical Visual Transformer (HVT) which progressively pools visual tokens to shrink the sequence length. It brings a great benefit by scaling dimensions of depth/width/resolution/patch size without introducing extra computational complexity. Our HVT outperforms the competitive baselines on ImageNet and CIFAR-100 datasets.
arXiv Detail & Related papers (2021-03-19T03:55:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.