Related papers: Reduce Information Loss in Transformers for Pluralistic Image Inpainting

Reduce Information Loss in Transformers for Pluralistic Image Inpainting

URL: http://arxiv.org/abs/2205.05076v1
Date: Tue, 10 May 2022 17:59:58 GMT
Title: Reduce Information Loss in Transformers for Pluralistic Image Inpainting
Authors: Qiankun Liu and Zhentao Tan and Dongdong Chen and Qi Chu and Xiyang Dai and Yinpeng Chen and Mengchen Liu and Lu Yuan and Nenghai Yu
Abstract summary: We propose a new transformer based framework "PUT" to keep input information as much as possible. PUT greatly outperforms state-of-the-art methods on image fidelity, especially for large masked regions and complex large-scale datasets.
Score: 112.50657646357494
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformers have achieved great success in pluralistic image inpainting recently. However, we find existing transformer based solutions regard each pixel as a token, thus suffer from information loss issue from two aspects: 1) They downsample the input image into much lower resolutions for efficiency consideration, incurring information loss and extra misalignment for the boundaries of masked regions. 2) They quantize $256^3$ RGB pixels to a small number (such as 512) of quantized pixels. The indices of quantized pixels are used as tokens for the inputs and prediction targets of transformer. Although an extra CNN network is used to upsample and refine the low-resolution results, it is difficult to retrieve the lost information back.To keep input information as much as possible, we propose a new transformer based framework "PUT". Specifically, to avoid input downsampling while maintaining the computation efficiency, we design a patch-based auto-encoder P-VQVAE, where the encoder converts the masked image into non-overlapped patch tokens and the decoder recovers the masked regions from inpainted tokens while keeping the unmasked regions unchanged. To eliminate the information loss caused by quantization, an Un-Quantized Transformer (UQ-Transformer) is applied, which directly takes the features from P-VQVAE encoder as input without quantization and regards the quantized tokens only as prediction targets. Extensive experiments show that PUT greatly outperforms state-of-the-art methods on image fidelity, especially for large masked regions and complex large-scale datasets.

Related papers

High-Frequency Prior-Driven Adaptive Masking for Accelerating Image Super-Resolution [87.56382172827526]
High-frequency regions are most critical for reconstruction.<n>We propose a training-free adaptive masking module for acceleration.<n>Our method reduces FLOPs by 24--43% for state-of-the-art models.
arXiv Detail & Related papers (2025-05-11T13:18:03Z)
Transformer based Pluralistic Image Completion with Reduced Information Loss [72.92754600354199]
Transformer based methods have achieved great success in image inpainting recently. They regard each pixel as a token, thus suffering from an information loss issue. We propose a new transformer based framework called "PUT"
arXiv Detail & Related papers (2024-03-31T01:20:16Z)
Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers [27.029600581635957]
We describe a method for identifying and re-processing only those tokens that have changed significantly over time. We evaluate our method on large-scale datasets for video object detection (ImageNet VID) and action recognition (EPIC-Kitchens 100)
arXiv Detail & Related papers (2023-08-25T17:10:12Z)
Cascaded Cross-Attention Networks for Data-Efficient Whole-Slide Image Classification Using Transformers [0.11219061154635457]
Whole-Slide Imaging allows for the capturing and digitization of high-resolution images of histological specimen. transformer architecture has been proposed as a possible candidate for effectively leveraging the high-resolution information. We propose a novel cascaded cross-attention network (CCAN) based on the cross-attention mechanism that scales linearly with the number of extracted patches.
arXiv Detail & Related papers (2023-05-11T16:42:24Z)
Accurate Image Restoration with Attention Retractable Transformer [50.05204240159985]
We propose Attention Retractable Transformer (ART) for image restoration. ART presents both dense and sparse attention modules in the network. We conduct extensive experiments on image super-resolution, denoising, and JPEG compression artifact reduction tasks.
arXiv Detail & Related papers (2022-10-04T07:35:01Z)
MAT: Mask-Aware Transformer for Large Hole Image Inpainting [79.67039090195527]
We present a novel model for large hole inpainting, which unifies the merits of transformers and convolutions. Experiments demonstrate the state-of-the-art performance of the new model on multiple benchmark datasets.
arXiv Detail & Related papers (2022-03-29T06:36:17Z)
Inpainting Transformer for Anomaly Detection [0.0]
Inpainting Transformer (InTra) is trained to inpaint covered patches in a large sequence of image patches. InTra achieves better than state-of-the-art results on the MVTec AD dataset for detection and localization.
arXiv Detail & Related papers (2021-04-28T17:27:44Z)
Modeling Lost Information in Lossy Image Compression [72.69327382643549]
Lossy image compression is one of the most commonly used operators for digital images. We propose a novel invertible framework called Invertible Lossy Compression (ILC) to largely mitigate the information loss problem.
arXiv Detail & Related papers (2020-06-22T04:04:56Z)
Visual Transformers: Token-based Image Representation and Processing for Computer Vision [67.55770209540306]
Visual Transformer ( VT) operates in a semantic token space, judiciously attending to different image parts based on context. Using an advanced training recipe, our VTs significantly outperform their convolutional counterparts. For semantic segmentation on LIP and COCO-stuff, VT-based feature pyramid networks (FPN) achieve 0.35 points higher mIoU while reducing the FPN module's FLOPs by 6.5x.
arXiv Detail & Related papers (2020-06-05T20:49:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.