Reduce Information Loss in Transformers for Pluralistic Image Inpainting
- URL: http://arxiv.org/abs/2205.05076v1
- Date: Tue, 10 May 2022 17:59:58 GMT
- Title: Reduce Information Loss in Transformers for Pluralistic Image Inpainting
- Authors: Qiankun Liu and Zhentao Tan and Dongdong Chen and Qi Chu and Xiyang
Dai and Yinpeng Chen and Mengchen Liu and Lu Yuan and Nenghai Yu
- Abstract summary: We propose a new transformer based framework "PUT" to keep input information as much as possible.
PUT greatly outperforms state-of-the-art methods on image fidelity, especially for large masked regions and complex large-scale datasets.
- Score: 112.50657646357494
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers have achieved great success in pluralistic image inpainting
recently. However, we find existing transformer based solutions regard each
pixel as a token, thus suffer from information loss issue from two aspects: 1)
They downsample the input image into much lower resolutions for efficiency
consideration, incurring information loss and extra misalignment for the
boundaries of masked regions. 2) They quantize $256^3$ RGB pixels to a small
number (such as 512) of quantized pixels. The indices of quantized pixels are
used as tokens for the inputs and prediction targets of transformer. Although
an extra CNN network is used to upsample and refine the low-resolution results,
it is difficult to retrieve the lost information back.To keep input information
as much as possible, we propose a new transformer based framework "PUT".
Specifically, to avoid input downsampling while maintaining the computation
efficiency, we design a patch-based auto-encoder P-VQVAE, where the encoder
converts the masked image into non-overlapped patch tokens and the decoder
recovers the masked regions from inpainted tokens while keeping the unmasked
regions unchanged. To eliminate the information loss caused by quantization, an
Un-Quantized Transformer (UQ-Transformer) is applied, which directly takes the
features from P-VQVAE encoder as input without quantization and regards the
quantized tokens only as prediction targets. Extensive experiments show that
PUT greatly outperforms state-of-the-art methods on image fidelity, especially
for large masked regions and complex large-scale datasets.
Related papers
- Transformer based Pluralistic Image Completion with Reduced Information Loss [72.92754600354199]
Transformer based methods have achieved great success in image inpainting recently.
They regard each pixel as a token, thus suffering from an information loss issue.
We propose a new transformer based framework called "PUT"
arXiv Detail & Related papers (2024-03-31T01:20:16Z) - Eventful Transformers: Leveraging Temporal Redundancy in Vision
Transformers [27.029600581635957]
We describe a method for identifying and re-processing only those tokens that have changed significantly over time.
We evaluate our method on large-scale datasets for video object detection (ImageNet VID) and action recognition (EPIC-Kitchens 100)
arXiv Detail & Related papers (2023-08-25T17:10:12Z) - Cascaded Cross-Attention Networks for Data-Efficient Whole-Slide Image
Classification Using Transformers [0.11219061154635457]
Whole-Slide Imaging allows for the capturing and digitization of high-resolution images of histological specimen.
transformer architecture has been proposed as a possible candidate for effectively leveraging the high-resolution information.
We propose a novel cascaded cross-attention network (CCAN) based on the cross-attention mechanism that scales linearly with the number of extracted patches.
arXiv Detail & Related papers (2023-05-11T16:42:24Z) - Accurate Image Restoration with Attention Retractable Transformer [50.05204240159985]
We propose Attention Retractable Transformer (ART) for image restoration.
ART presents both dense and sparse attention modules in the network.
We conduct extensive experiments on image super-resolution, denoising, and JPEG compression artifact reduction tasks.
arXiv Detail & Related papers (2022-10-04T07:35:01Z) - MAT: Mask-Aware Transformer for Large Hole Image Inpainting [79.67039090195527]
We present a novel model for large hole inpainting, which unifies the merits of transformers and convolutions.
Experiments demonstrate the state-of-the-art performance of the new model on multiple benchmark datasets.
arXiv Detail & Related papers (2022-03-29T06:36:17Z) - Inpainting Transformer for Anomaly Detection [0.0]
Inpainting Transformer (InTra) is trained to inpaint covered patches in a large sequence of image patches.
InTra achieves better than state-of-the-art results on the MVTec AD dataset for detection and localization.
arXiv Detail & Related papers (2021-04-28T17:27:44Z) - Modeling Lost Information in Lossy Image Compression [72.69327382643549]
Lossy image compression is one of the most commonly used operators for digital images.
We propose a novel invertible framework called Invertible Lossy Compression (ILC) to largely mitigate the information loss problem.
arXiv Detail & Related papers (2020-06-22T04:04:56Z) - Visual Transformers: Token-based Image Representation and Processing for
Computer Vision [67.55770209540306]
Visual Transformer ( VT) operates in a semantic token space, judiciously attending to different image parts based on context.
Using an advanced training recipe, our VTs significantly outperform their convolutional counterparts.
For semantic segmentation on LIP and COCO-stuff, VT-based feature pyramid networks (FPN) achieve 0.35 points higher mIoU while reducing the FPN module's FLOPs by 6.5x.
arXiv Detail & Related papers (2020-06-05T20:49:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.