Masked Discriminators for Content-Consistent Unpaired Image-to-Image
Translation
- URL: http://arxiv.org/abs/2309.13188v1
- Date: Fri, 22 Sep 2023 21:32:07 GMT
- Title: Masked Discriminators for Content-Consistent Unpaired Image-to-Image
Translation
- Authors: Bonifaz Stuhr, J\"urgen Brauer, Bernhard Schick, Jordi Gonz\`alez
- Abstract summary: A common goal of unpaired image-to-image translation is to preserve content consistency between source images and translated images.
We show that masking the inputs of a global discriminator for both domains with a content-based mask is sufficient to reduce content inconsistencies significantly.
In our experiments, we show that our method achieves state-of-the-art performance in photorealistic sim-to-real translation and weather translation.
- Score: 1.3654846342364308
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A common goal of unpaired image-to-image translation is to preserve content
consistency between source images and translated images while mimicking the
style of the target domain. Due to biases between the datasets of both domains,
many methods suffer from inconsistencies caused by the translation process.
Most approaches introduced to mitigate these inconsistencies do not constrain
the discriminator, leading to an even more ill-posed training setup. Moreover,
none of these approaches is designed for larger crop sizes. In this work, we
show that masking the inputs of a global discriminator for both domains with a
content-based mask is sufficient to reduce content inconsistencies
significantly. However, this strategy leads to artifacts that can be traced
back to the masking process. To reduce these artifacts, we introduce a local
discriminator that operates on pairs of small crops selected with a similarity
sampling strategy. Furthermore, we apply this sampling strategy to sample
global input crops from the source and target dataset. In addition, we propose
feature-attentive denormalization to selectively incorporate content-based
statistics into the generator stream. In our experiments, we show that our
method achieves state-of-the-art performance in photorealistic sim-to-real
translation and weather translation and also performs well in day-to-night
translation. Additionally, we propose the cKVD metric, which builds on the sKVD
metric and enables the examination of translation quality at the class or
category level.
Related papers
- Seed-to-Seed: Image Translation in Diffusion Seed Space [20.590890565046074]
We introduce Seed-to-Seed Translation (StS), a novel approach for Image-to-Image Translation using diffusion models (DMs)
We leverage the semantic information encoded within the space of inverted seeds of a pretrained DM, dubbed as the seed-space.
Our approach offers a fresh perspective on leveraging the semantic information encoded within the seed-space of pretrained DMs for effective image editing and manipulation.
arXiv Detail & Related papers (2024-09-01T08:07:59Z) - Multi-cropping Contrastive Learning and Domain Consistency for
Unsupervised Image-to-Image Translation [5.562419999563734]
We propose a novel unsupervised image-to-image translation framework based on multi-cropping contrastive learning and domain consistency, called MCDUT.
In many image-to-image translation tasks, our method achieves state-of-the-art results, and the advantages of our method have been proven through comparison experiments and ablation research.
arXiv Detail & Related papers (2023-04-24T16:20:28Z) - Smooth image-to-image translations with latent space interpolations [64.8170758294427]
Multi-domain image-to-image (I2I) translations can transform a source image according to the style of a target domain.
We show that our regularization techniques can improve the state-of-the-art I2I translations by a large margin.
arXiv Detail & Related papers (2022-10-03T11:57:30Z) - Global and Local Alignment Networks for Unpaired Image-to-Image
Translation [170.08142745705575]
The goal of unpaired image-to-image translation is to produce an output image reflecting the target domain's style.
Due to the lack of attention to the content change in existing methods, semantic information from source images suffers from degradation during translation.
We introduce a novel approach, Global and Local Alignment Networks (GLA-Net)
Our method effectively generates sharper and more realistic images than existing approaches.
arXiv Detail & Related papers (2021-11-19T18:01:54Z) - Separating Content and Style for Unsupervised Image-to-Image Translation [20.44733685446886]
Unsupervised image-to-image translation aims to learn the mapping between two visual domains with unpaired samples.
We propose to separate the content code and style code simultaneously in a unified framework.
Based on the correlation between the latent features and the high-level domain-invariant tasks, the proposed framework demonstrates superior performance.
arXiv Detail & Related papers (2021-10-27T12:56:50Z) - Semi-supervised Semantic Segmentation with Directional Context-aware
Consistency [66.49995436833667]
We focus on the semi-supervised segmentation problem where only a small set of labeled data is provided with a much larger collection of totally unlabeled images.
A preferred high-level representation should capture the contextual information while not losing self-awareness.
We present the Directional Contrastive Loss (DC Loss) to accomplish the consistency in a pixel-to-pixel manner.
arXiv Detail & Related papers (2021-06-27T03:42:40Z) - Smoothing the Disentangled Latent Style Space for Unsupervised
Image-to-Image Translation [56.55178339375146]
Image-to-Image (I2I) multi-domain translation models are usually evaluated also using the quality of their semantic results.
We propose a new training protocol based on three specific losses which help a translation network to learn a smooth and disentangled latent style space.
arXiv Detail & Related papers (2021-06-16T17:58:21Z) - Unpaired Image-to-Image Translation via Latent Energy Transport [61.62293304236371]
Image-to-image translation aims to preserve source contents while translating to discriminative target styles between two visual domains.
In this paper, we propose to deploy an energy-based model (EBM) in the latent space of a pretrained autoencoder for this task.
Our model is the first to be applicable to 1024$times$1024-resolution unpaired image translation.
arXiv Detail & Related papers (2020-12-01T17:18:58Z) - GAIT: Gradient Adjusted Unsupervised Image-to-Image Translation [5.076419064097734]
An adversarial loss is utilized to match the distributions of the translated and target image sets.
This may create artifacts if two domains have different marginal distributions, for example, in uniform areas.
We propose an unsupervised IIT that preserves the uniform regions after the translation.
arXiv Detail & Related papers (2020-09-02T08:04:00Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.