TransRef: Multi-Scale Reference Embedding Transformer for Reference-Guided Image Inpainting
- URL: http://arxiv.org/abs/2306.11528v3
- Date: Thu, 03 Oct 2024 14:02:10 GMT
- Title: TransRef: Multi-Scale Reference Embedding Transformer for Reference-Guided Image Inpainting
- Authors: Taorong Liu, Liang Liao, Delin Chen, Jing Xiao, Zheng Wang, Chia-Wen Lin, Shin'ichi Satoh,
- Abstract summary: We propose a transformer-based encoder-decoder network, named TransRef, for reference-guided image inpainting.
For precise utilization of the reference features for guidance, a reference-patch alignment (Ref-PA) module is proposed to align the patch features of the reference and corrupted images.
We construct a publicly accessible benchmark dataset containing 50K pairs of input and reference images.
- Score: 45.31389892299325
- License:
- Abstract: Image inpainting for completing complicated semantic environments and diverse hole patterns of corrupted images is challenging even for state-of-the-art learning-based inpainting methods trained on large-scale data. A reference image capturing the same scene of a corrupted image offers informative guidance for completing the corrupted image as it shares similar texture and structure priors to that of the holes of the corrupted image. In this work, we propose a transformer-based encoder-decoder network, named TransRef, for reference-guided image inpainting. Specifically, the guidance is conducted progressively through a reference embedding procedure, in which the referencing features are subsequently aligned and fused with the features of the corrupted image. For precise utilization of the reference features for guidance, a reference-patch alignment (Ref-PA) module is proposed to align the patch features of the reference and corrupted images and harmonize their style differences, while a reference-patch transformer (Ref-PT) module is proposed to refine the embedded reference feature. Moreover, to facilitate the research of reference-guided image restoration tasks, we construct a publicly accessible benchmark dataset containing 50K pairs of input and reference images. Both quantitative and qualitative evaluations demonstrate the efficacy of the reference information and the proposed method over the state-of-the-art methods in completing complex holes. Code and dataset can be accessed at https://github.com/Cameltr/TransRef.
Related papers
- ENTED: Enhanced Neural Texture Extraction and Distribution for
Reference-based Blind Face Restoration [51.205673783866146]
We present ENTED, a new framework for blind face restoration that aims to restore high-quality and realistic portrait images.
We utilize a texture extraction and distribution framework to transfer high-quality texture features between the degraded input and reference image.
The StyleGAN-like architecture in our framework requires high-quality latent codes to generate realistic images.
arXiv Detail & Related papers (2024-01-13T04:54:59Z) - Mask Reference Image Quality Assessment [8.087355843192109]
Mask Reference IQA (MR-IQA) is a method that masks specific patches of a distorted image and supplements missing patches with the reference image patches.
Our method achieves state-of-the-art performances on the benchmark KADID-10k, LIVE and CSIQ datasets.
arXiv Detail & Related papers (2023-02-27T13:52:38Z) - Generalizable Person Re-Identification via Viewpoint Alignment and
Fusion [74.30861504619851]
This work proposes to use a 3D dense pose estimation model and a texture mapping module to map pedestrian images to canonical view images.
Due to the imperfection of the texture mapping module, the canonical view images may lose the discriminative detail clues from the original images.
We show that our method can lead to superior performance over the existing approaches in various evaluation settings.
arXiv Detail & Related papers (2022-12-05T16:24:09Z) - Reference-Guided Texture and Structure Inference for Image Inpainting [25.775006005766222]
We build a benchmark dataset containing 10K pairs of input and reference images for reference-guided inpainting.
We adopt an encoder-decoder structure to infer the texture and structure features of the input image.
A feature alignment module is further designed to refine these features of the input image with the guidance of a reference image.
arXiv Detail & Related papers (2022-07-29T06:26:03Z) - DocEnTr: An End-to-End Document Image Enhancement Transformer [13.108797370734893]
Document images can be affected by many degradation scenarios, which cause recognition and processing difficulties.
We present a new encoder-decoder architecture based on vision transformers to enhance both machine-printed and handwritten document images.
arXiv Detail & Related papers (2022-01-25T11:45:35Z) - TransFill: Reference-guided Image Inpainting by Merging Multiple Color
and Spatial Transformations [35.9576572490994]
We propose TransFill, a multi-homography transformed fusion method to fill the hole by referring to another source image that shares scene contents with the target image.
We learn to adjust the color and apply a pixel-level warping to each homography-warped source image to make it more consistent with the target.
Our method achieves state-of-the-art performance on pairs of images across a variety of wide baselines and color differences, and generalizes to user-provided image pairs.
arXiv Detail & Related papers (2021-03-29T22:45:07Z) - RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval [76.87013602243053]
We propose a differentiable retrieval module to synthesize images from scene description with retrieved patches as reference.
We conduct extensive quantitative and qualitative experiments to demonstrate that the proposed method can generate realistic and diverse images.
arXiv Detail & Related papers (2020-07-16T17:59:04Z) - Coarse-to-Fine Gaze Redirection with Numerical and Pictorial Guidance [74.27389895574422]
We propose a novel gaze redirection framework which exploits both a numerical and a pictorial direction guidance.
The proposed method outperforms the state-of-the-art approaches in terms of both image quality and redirection precision.
arXiv Detail & Related papers (2020-04-07T01:17:27Z) - Guidance and Evaluation: Semantic-Aware Image Inpainting for Mixed
Scenes [54.836331922449666]
We propose a Semantic Guidance and Evaluation Network (SGE-Net) to update the structural priors and the inpainted image.
It utilizes semantic segmentation map as guidance in each scale of inpainting, under which location-dependent inferences are re-evaluated.
Experiments on real-world images of mixed scenes demonstrated the superiority of our proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2020-03-15T17:49:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.