Learning to Scale Temperature in Masked Self-Attention for Image
Inpainting
- URL: http://arxiv.org/abs/2302.06130v1
- Date: Mon, 13 Feb 2023 06:37:17 GMT
- Title: Learning to Scale Temperature in Masked Self-Attention for Image
Inpainting
- Authors: Xiang Zhou, Yuan Zeng, Yi Gong
- Abstract summary: We present an image inpainting framework with a multi-head temperature masked self-attention mechanism.
In addition to improving image quality of inpainting results, we generalize the proposed model to user-guided image editing by introducing a new sketch generation method.
- Score: 11.52934596799707
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in deep generative adversarial networks (GAN) and
self-attention mechanism have led to significant improvements in the
challenging task of inpainting large missing regions in an image. These methods
integrate self-attention mechanism in neural networks to utilize surrounding
neural elements based on their correlation and help the networks capture
long-range dependencies. Temperature is a parameter in the Softmax function
used in the self-attention, and it enables biasing the distribution of
attention scores towards a handful of similar patches. Most existing
self-attention mechanisms in image inpainting are convolution-based and set the
temperature as a constant, performing patch matching in a limited feature
space. In this work, we analyze the artifacts and training problems in previous
self-attention mechanisms, and redesign the temperature learning network as
well as the self-attention mechanism to address them. We present an image
inpainting framework with a multi-head temperature masked self-attention
mechanism, which provides stable and efficient temperature learning and uses
multiple distant contextual information for high quality image inpainting. In
addition to improving image quality of inpainting results, we generalize the
proposed model to user-guided image editing by introducing a new sketch
generation method. Extensive experiments on various datasets such as Paris
StreetView, CelebA-HQ and Places2 clearly demonstrate that our method not only
generates more natural inpainting results than previous works both in terms of
perception image quality and quantitative metrics, but also enables to help
users to generate more flexible results that are related to their sketch
guidance.
Related papers
- DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed
Dual-Branch Diffusion [61.90969199199739]
BrushNet is a novel plug-and-play dual-branch model engineered to embed pixel-level masked image features into any pre-trained DM.
BrushNet's superior performance over existing models across seven key metrics, including image quality, mask region preservation, and textual coherence.
arXiv Detail & Related papers (2024-03-11T17:59:31Z) - CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - Learning Prior Feature and Attention Enhanced Image Inpainting [63.21231753407192]
This paper incorporates the pre-training based Masked AutoEncoder (MAE) into the inpainting model.
We propose to use attention priors from MAE to make the inpainting model learn more long-distance dependencies between masked and unmasked regions.
arXiv Detail & Related papers (2022-08-03T04:32:53Z) - Interactive Image Inpainting Using Semantic Guidance [36.34615403590834]
This paper develops a novel image inpainting approach that enables users to customize the inpainting result by their own preference or memory.
In the first stage, an autoencoder based on a novel external spatial attention mechanism is deployed to produce reconstructed features of the corrupted image.
In the second stage, a semantic decoder that takes the reconstructed features as prior is adopted to synthesize a fine inpainting result guided by user's customized semantic mask.
arXiv Detail & Related papers (2022-01-26T05:09:42Z) - Restore from Restored: Single-image Inpainting [9.699531255678856]
We present a novel and efficient self-supervised fine-tuning algorithm for inpainting networks.
We update the parameters of the pre-trained inpainting networks by utilizing existing self-similar patches.
We achieve state-of-the-art inpainting results on publicly available benchmark datasets.
arXiv Detail & Related papers (2021-10-25T11:38:51Z) - Restore from Restored: Single-image Inpainting [9.699531255678856]
We present a novel and efficient self-supervised fine-tuning algorithm for inpainting networks.
We upgrade the parameters of the pretrained networks by utilizing existing self-similar patches within the given input image.
We achieve state-of-the-art inpainting results on publicly available benchmark datasets.
arXiv Detail & Related papers (2021-02-16T10:59:28Z) - Region-of-interest guided Supervoxel Inpainting for Self-supervision [8.744460886823322]
Self-supervised learning has proven to be invaluable in making best use of all of the available data in biomedical image segmentation.
One particularly simple and effective mechanism to achieve self-supervision is inpainting, the task of predicting arbitrary missing areas based on the rest of an image.
We propose two novel structural changes to further enhance the performance of a deep neural network.
We empirically show that our proposed approach consistently outperforms both supervised CNNs, without any self-supervision, and conventional inpainting-based self-supervision methods on both large and small training set sizes.
arXiv Detail & Related papers (2020-06-26T19:28:20Z) - Text-to-Image Generation with Attention Based Recurrent Neural Networks [1.2599533416395765]
We develop a tractable and stable caption-based image generation model.
Experimentations are performed on Microsoft datasets.
Results show that the proposed model performs better than contemporary approaches.
arXiv Detail & Related papers (2020-01-18T12:19:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.