Unsupervised Structure-Consistent Image-to-Image Translation
- URL: http://arxiv.org/abs/2208.11546v1
- Date: Wed, 24 Aug 2022 13:47:15 GMT
- Title: Unsupervised Structure-Consistent Image-to-Image Translation
- Authors: Shima Shahfar and Charalambos Poullis
- Abstract summary: The Swapping Autoencoder achieved state-of-the-art performance in deep image manipulation and image-to-image translation.
We improve this work by introducing a simple yet effective auxiliary module based on gradient reversal layers.
The auxiliary module's loss forces the generator to learn to reconstruct an image with an all-zero texture code.
- Score: 6.282068591820945
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The Swapping Autoencoder achieved state-of-the-art performance in deep image
manipulation and image-to-image translation. We improve this work by
introducing a simple yet effective auxiliary module based on gradient reversal
layers. The auxiliary module's loss forces the generator to learn to
reconstruct an image with an all-zero texture code, encouraging better
disentanglement between the structure and texture information. The proposed
attribute-based transfer method enables refined control in style transfer while
preserving structural information without using a semantic mask. To manipulate
an image, we encode both the geometry of the objects and the general style of
the input images into two latent codes with an additional constraint that
enforces structure consistency. Moreover, due to the auxiliary loss, training
time is significantly reduced. The superiority of the proposed model is
demonstrated in complex domains such as satellite images where state-of-the-art
are known to fail. Lastly, we show that our model improves the quality metrics
for a wide range of datasets while achieving comparable results with
multi-modal image generation techniques.
Related papers
- ENTED: Enhanced Neural Texture Extraction and Distribution for
Reference-based Blind Face Restoration [51.205673783866146]
We present ENTED, a new framework for blind face restoration that aims to restore high-quality and realistic portrait images.
We utilize a texture extraction and distribution framework to transfer high-quality texture features between the degraded input and reference image.
The StyleGAN-like architecture in our framework requires high-quality latent codes to generate realistic images.
arXiv Detail & Related papers (2024-01-13T04:54:59Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Mixed Hierarchy Network for Image Restoration [0.0]
We present a mixed hierarchy network that can balance quality and system complexity in image restoration.
Our model first learns the contextual information using encoder-decoder architectures, and then combines them with high-resolution branches that preserve spatial detail.
The resulting tightly interlinked hierarchy architecture, named as MHNet, delivers strong performance gains on several image restoration tasks.
arXiv Detail & Related papers (2023-02-19T12:18:45Z) - Progressive with Purpose: Guiding Progressive Inpainting DNNs through
Context and Structure [0.0]
We propose a novel inpainting network that maintains the structural and contextual integrity of a processed image.
Inspired by the Gaussian and Laplacian pyramids, the core of the proposed network is a feature extraction module named GLE.
Our benchmarking experiments demonstrate that the proposed method achieves clear improvement in performance over many state-of-the-art inpainting algorithms.
arXiv Detail & Related papers (2022-09-21T02:15:02Z) - CM-GAN: Image Inpainting with Cascaded Modulation GAN and Object-Aware
Training [112.96224800952724]
We propose cascaded modulation GAN (CM-GAN) to generate plausible image structures when dealing with large holes in complex images.
In each decoder block, global modulation is first applied to perform coarse semantic-aware synthesis structure, then spatial modulation is applied on the output of global modulation to further adjust the feature map in a spatially adaptive fashion.
In addition, we design an object-aware training scheme to prevent the network from hallucinating new objects inside holes, fulfilling the needs of object removal tasks in real-world scenarios.
arXiv Detail & Related papers (2022-03-22T16:13:27Z) - SDWNet: A Straight Dilated Network with Wavelet Transformation for Image
Deblurring [23.86692375792203]
Image deblurring is a computer vision problem that aims to recover a sharp image from a blurred image.
Our model uses dilated convolution to enable the obtainment of the large receptive field with high spatial resolution.
We propose a novel module using the wavelet transform, which effectively helps the network to recover clear high-frequency texture details.
arXiv Detail & Related papers (2021-10-12T07:58:10Z) - Semantic Layout Manipulation with High-Resolution Sparse Attention [106.59650698907953]
We tackle the problem of semantic image layout manipulation, which aims to manipulate an input image by editing its semantic label map.
A core problem of this task is how to transfer visual details from the input images to the new semantic layout while making the resulting image visually realistic.
We propose a high-resolution sparse attention module that effectively transfers visual details to new layouts at a resolution up to 512x512.
arXiv Detail & Related papers (2020-12-14T06:50:43Z) - TSIT: A Simple and Versatile Framework for Image-to-Image Translation [103.92203013154403]
We introduce a simple and versatile framework for image-to-image translation.
We provide a carefully designed two-stream generative model with newly proposed feature transformations.
This allows multi-scale semantic structure information and style representation to be effectively captured and fused by the network.
A systematic study compares the proposed method with several state-of-the-art task-specific baselines, verifying its effectiveness in both perceptual quality and quantitative evaluations.
arXiv Detail & Related papers (2020-07-23T15:34:06Z) - Region-adaptive Texture Enhancement for Detailed Person Image Synthesis [86.69934638569815]
RATE-Net is a novel framework for synthesizing person images with sharp texture details.
The proposed framework leverages an additional texture enhancing module to extract appearance information from the source image.
Experiments conducted on DeepFashion benchmark dataset have demonstrated the superiority of our framework compared with existing networks.
arXiv Detail & Related papers (2020-05-26T02:33:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.