Generative Memory-Guided Semantic Reasoning Model for Image Inpainting
- URL: http://arxiv.org/abs/2110.00261v1
- Date: Fri, 1 Oct 2021 08:37:34 GMT
- Title: Generative Memory-Guided Semantic Reasoning Model for Image Inpainting
- Authors: Xin Feng, Wenjie Pei, Fengjun Li, Fanglin Chen, David Zhang, and
Guangming Lu
- Abstract summary: We propose the Generative Memory-Guided Semantic Reasoning Model (GM-SRM) for image inpainting.
The proposed GM-SRM learns the intra-image priors from the known regions, but also distills the inter-image reasoning priors to infer the content of the corrupted regions.
Extensive experiments on Paris Street View, CelebA-HQ, and Places2 benchmarks demonstrate that our GM-SRM outperforms the state-of-the-art methods for image inpainting.
- Score: 34.092255842494396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most existing methods for image inpainting focus on learning the intra-image
priors from the known regions of the current input image to infer the content
of the corrupted regions in the same image. While such methods perform well on
images with small corrupted regions, it is challenging for these methods to
deal with images with large corrupted area due to two potential limitations: 1)
such methods tend to overfit each single training pair of images relying solely
on the intra-image prior knowledge learned from the limited known area; 2) the
inter-image prior knowledge about the general distribution patterns of visual
semantics, which can be transferred across images sharing similar semantics, is
not exploited. In this paper, we propose the Generative Memory-Guided Semantic
Reasoning Model (GM-SRM), which not only learns the intra-image priors from the
known regions, but also distills the inter-image reasoning priors to infer the
content of the corrupted regions. In particular, the proposed GM-SRM first
pre-learns a generative memory from the whole training data to capture the
semantic distribution patterns in a global view. Then the learned memory are
leveraged to retrieve the matching inter-image priors for the current corrupted
image to perform semantic reasoning during image inpainting. While the
intra-image priors are used for guaranteeing the pixel-level content
consistency, the inter-image priors are favorable for performing high-level
semantic reasoning, which is particularly effective for inferring semantic
content for large corrupted area. Extensive experiments on Paris Street View,
CelebA-HQ, and Places2 benchmarks demonstrate that our GM-SRM outperforms the
state-of-the-art methods for image inpainting in terms of both the visual
quality and quantitative metrics.
Related papers
- Realistic Extreme Image Rescaling via Generative Latent Space Learning [51.85790402171696]
We propose a novel framework called Latent Space Based Image Rescaling (LSBIR) for extreme image rescaling tasks.
LSBIR effectively leverages powerful natural image priors learned by a pre-trained text-to-image diffusion model to generate realistic HR images.
In the first stage, a pseudo-invertible encoder-decoder models the bidirectional mapping between the latent features of the HR image and the target-sized LR image.
In the second stage, the reconstructed features from the first stage are refined by a pre-trained diffusion model to generate more faithful and visually pleasing details.
arXiv Detail & Related papers (2024-08-17T09:51:42Z) - Enhancing Image Layout Control with Loss-Guided Diffusion Models [0.0]
Diffusion models produce high-quality images from pure noise using a simple text prompt.
A subset of these methods take advantage of the models' attention mechanism, and are training-free.
We provide an interpretation for these methods which highlights their complimentary features, and demonstrate that it is possible to obtain superior performance when both methods are used in concert.
arXiv Detail & Related papers (2024-05-23T02:08:44Z) - CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - Improving Generalization of Image Captioning with Unsupervised Prompt
Learning [63.26197177542422]
Generalization of Image Captioning (GeneIC) learns a domain-specific prompt vector for the target domain without requiring annotated data.
GeneIC aligns visual and language modalities with a pre-trained Contrastive Language-Image Pre-Training (CLIP) model.
arXiv Detail & Related papers (2023-08-05T12:27:01Z) - Boosting Image Outpainting with Semantic Layout Prediction [18.819765707811904]
We train a GAN to extend regions in semantic segmentation domain instead of image domain.
Another GAN model is trained to synthesize real images based on the extended semantic layouts.
Our approach can handle semantic clues more easily and hence works better in complex scenarios.
arXiv Detail & Related papers (2021-10-18T13:09:31Z) - Bridging Composite and Real: Towards End-to-end Deep Image Matting [88.79857806542006]
We study the roles of semantics and details for image matting.
We propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders.
Comprehensive empirical studies have demonstrated that GFM outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-30T10:57:13Z) - Rethinking of the Image Salient Object Detection: Object-level Semantic
Saliency Re-ranking First, Pixel-wise Saliency Refinement Latter [62.26677215668959]
We propose a lightweight, weakly supervised deep network to coarsely locate semantically salient regions.
We then fuse multiple off-the-shelf deep models on these semantically salient regions as the pixel-wise saliency refinement.
Our method is simple yet effective, which is the first attempt to consider the salient object detection mainly as an object-level semantic re-ranking problem.
arXiv Detail & Related papers (2020-08-10T07:12:43Z) - Arbitrary-sized Image Training and Residual Kernel Learning: Towards
Image Fraud Identification [10.47223719403823]
We propose a framework for training images of original input scales without resizing.
Our arbitrary-sized image training method depends on the pseudo-batch gradient descent.
With the learnt residual kernels and PBGD, the proposed framework achieved the state-of-the-art results in image fraud identification.
arXiv Detail & Related papers (2020-05-22T07:57:24Z) - Enhanced Residual Networks for Context-based Image Outpainting [0.0]
Deep models struggle to understand context and extrapolation through retained information.
Current models use generative adversarial networks to generate results which lack localized image feature consistency and appear fake.
We propose two methods to improve this issue: the use of a local and global discriminator, and the addition of residual blocks within the encoding section of the network.
arXiv Detail & Related papers (2020-05-14T05:14:26Z) - Exploiting Deep Generative Prior for Versatile Image Restoration and
Manipulation [181.08127307338654]
This work presents an effective way to exploit the image prior captured by a generative adversarial network (GAN) trained on large-scale natural images.
The deep generative prior (DGP) provides compelling results to restore missing semantics, e.g., color, patch, resolution, of various degraded images.
arXiv Detail & Related papers (2020-03-30T17:45:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.