Image Inpainting via Conditional Texture and Structure Dual Generation
- URL: http://arxiv.org/abs/2108.09760v2
- Date: Sat, 6 Apr 2024 15:11:41 GMT
- Title: Image Inpainting via Conditional Texture and Structure Dual Generation
- Authors: Xiefan Guo, Hongyu Yang, Di Huang,
- Abstract summary: We propose a novel two-stream network for image inpainting, which models the structure-constrained texture synthesis and texture-guided structure reconstruction.
To enhance the global consistency, a Bi-directional Gated Feature Fusion (Bi-GFF) module is designed to exchange and combine the structure and texture information.
Experiments on the CelebA, Paris StreetView and Places2 datasets demonstrate the superiority of the proposed method.
- Score: 26.97159780261334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep generative approaches have recently made considerable progress in image inpainting by introducing structure priors. Due to the lack of proper interaction with image texture during structure reconstruction, however, current solutions are incompetent in handling the cases with large corruptions, and they generally suffer from distorted results. In this paper, we propose a novel two-stream network for image inpainting, which models the structure-constrained texture synthesis and texture-guided structure reconstruction in a coupled manner so that they better leverage each other for more plausible generation. Furthermore, to enhance the global consistency, a Bi-directional Gated Feature Fusion (Bi-GFF) module is designed to exchange and combine the structure and texture information and a Contextual Feature Aggregation (CFA) module is developed to refine the generated contents by region affinity learning and multi-scale feature aggregation. Qualitative and quantitative experiments on the CelebA, Paris StreetView and Places2 datasets demonstrate the superiority of the proposed method. Our code is available at https://github.com/Xiefan-Guo/CTSDG.
Related papers
- Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and
Latent Diffusion [50.59261592343479]
We present Kandinsky1, a novel exploration of latent diffusion architecture.
The proposed model is trained separately to map text embeddings to image embeddings of CLIP.
We also deployed a user-friendly demo system that supports diverse generative modes such as text-to-image generation, image fusion, text and image fusion, image variations generation, and text-guided inpainting/outpainting.
arXiv Detail & Related papers (2023-10-05T12:29:41Z) - A Structure-Guided Diffusion Model for Large-Hole Image Completion [85.61681358977266]
We develop a structure-guided diffusion model to fill large holes in images.
Our method achieves a superior or comparable visual quality compared to state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-18T18:59:01Z) - Keys to Better Image Inpainting: Structure and Texture Go Hand in Hand [28.32208483559088]
We claim that the performance of inpainting algorithms can be better judged by the generated structures and textures.
In this paper, we propose a novel inpainting network combining the advantages of the two designs.
Our model achieves a remarkable visual quality to match state-of-the-art performance in both structure generation and repeating texture synthesis.
arXiv Detail & Related papers (2022-08-05T20:42:13Z) - Multi-feature Co-learning for Image Inpainting [2.4571440831539824]
In this paper, we design a deep multi-feature co-learning network for image inpainting.
To be specific, we first use two branches to learn structure features and texture features separately.
The proposed SDFF module integrates structure features into texture features, and meanwhile uses texture features as an auxiliary in generating structure features.
arXiv Detail & Related papers (2022-05-21T12:15:26Z) - Generating Diverse Structure for Image Inpainting With Hierarchical
VQ-VAE [74.29384873537587]
We propose a two-stage model for diverse inpainting, where the first stage generates multiple coarse results each of which has a different structure, and the second stage refines each coarse result separately by augmenting texture.
Experimental results on CelebA-HQ, Places2, and ImageNet datasets show that our method not only enhances the diversity of the inpainting solutions but also improves the visual quality of the generated multiple images.
arXiv Detail & Related papers (2021-03-18T05:10:49Z) - Conceptual Compression via Deep Structure and Texture Synthesis [42.68994438290913]
We propose a novel conceptual compression framework that encodes visual data into compact structure and texture representations, then decodes in a deep synthesis fashion.
In particular, we propose to compress images by a dual-layered model consisting of two complementary visual features.
At the encoder side, the structural maps and texture representations are individually extracted and compressed, generating the compact, interpretable, inter-operable bitstreams.
During the decoding stage, a hierarchical fusion GAN (HF-GAN) is proposed to learn the synthesis paradigm where the textures are rendered into the decoded structural maps, leading to high-quality reconstruction
arXiv Detail & Related papers (2020-11-10T08:48:32Z) - Region-adaptive Texture Enhancement for Detailed Person Image Synthesis [86.69934638569815]
RATE-Net is a novel framework for synthesizing person images with sharp texture details.
The proposed framework leverages an additional texture enhancing module to extract appearance information from the source image.
Experiments conducted on DeepFashion benchmark dataset have demonstrated the superiority of our framework compared with existing networks.
arXiv Detail & Related papers (2020-05-26T02:33:21Z) - Guidance and Evaluation: Semantic-Aware Image Inpainting for Mixed
Scenes [54.836331922449666]
We propose a Semantic Guidance and Evaluation Network (SGE-Net) to update the structural priors and the inpainted image.
It utilizes semantic segmentation map as guidance in each scale of inpainting, under which location-dependent inferences are re-evaluated.
Experiments on real-world images of mixed scenes demonstrated the superiority of our proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2020-03-15T17:49:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.