Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image
Translation
- URL: http://arxiv.org/abs/2203.12707v1
- Date: Wed, 23 Mar 2022 19:59:04 GMT
- Title: Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image
Translation
- Authors: Yanwu Xu, Shaoan Xie, Wenhao Wu, Kun Zhang, Mingming Gong and Kayhan
Batmanghelich
- Abstract summary: This paper proposes a universal regularization technique called maximum spatial perturbation consistency (MSPC)
MSPC enforces a spatial perturbation function (T ) and the translation operator (G) to be commutative (i.e., TG = GT )
Our method outperforms the state-of-the-art methods on most I2I benchmarks.
- Score: 56.44946660061753
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unpaired image-to-image translation (I2I) is an ill-posed problem, as an
infinite number of translation functions can map the source domain distribution
to the target distribution. Therefore, much effort has been put into designing
suitable constraints, e.g., cycle consistency (CycleGAN), geometry consistency
(GCGAN), and contrastive learning-based constraints (CUTGAN), that help better
pose the problem. However, these well-known constraints have limitations: (1)
they are either too restrictive or too weak for specific I2I tasks; (2) these
methods result in content distortion when there is a significant spatial
variation between the source and target domains. This paper proposes a
universal regularization technique called maximum spatial perturbation
consistency (MSPC), which enforces a spatial perturbation function (T ) and the
translation operator (G) to be commutative (i.e., TG = GT ). In addition, we
introduce two adversarial training components for learning the spatial
perturbation function. The first one lets T compete with G to achieve maximum
perturbation. The second one lets G and T compete with discriminators to align
the spatial variations caused by the change of object size, object distortion,
background interruptions, etc. Our method outperforms the state-of-the-art
methods on most I2I benchmarks. We also introduce a new benchmark, namely the
front face to profile face dataset, to emphasize the underlying challenges of
I2I for real-world applications. We finally perform ablation experiments to
study the sensitivity of our method to the severity of spatial perturbation and
its effectiveness for distribution alignment.
Related papers
- Dynamic Position Transformation and Boundary Refinement Network for Left Atrial Segmentation [17.09918110723713]
Left atrial (LA) segmentation is a crucial technique for irregular heartbeat (i.e., atrial fibrillation) diagnosis.
Most current methods for LA segmentation strictly assume that the input data is acquired using object-oriented center cropping.
We propose a novel Dynamic Position transformation and Boundary refinement Network (DPBNet) to tackle these issues.
arXiv Detail & Related papers (2024-07-07T22:09:35Z) - Double Duality: Variational Primal-Dual Policy Optimization for
Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure.
Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z) - Smooth image-to-image translations with latent space interpolations [64.8170758294427]
Multi-domain image-to-image (I2I) translations can transform a source image according to the style of a target domain.
We show that our regularization techniques can improve the state-of-the-art I2I translations by a large margin.
arXiv Detail & Related papers (2022-10-03T11:57:30Z) - ContraCLIP: Interpretable GAN generation driven by pairs of contrasting
sentences [45.06326873752593]
We find non-linear interpretable paths in the latent space of pre-trained GANs in a model-agnostic manner.
By defining an objective that discovers paths that generate changes along the desired paths in the vision-language embedding space, we provide an intuitive way of controlling the underlying generative factors.
arXiv Detail & Related papers (2022-06-05T06:13:42Z) - ResiDualGAN: Resize-Residual DualGAN for Cross-Domain Remote Sensing
Images Semantic Segmentation [15.177834801688979]
The performance of a semantic segmentation model for remote sensing (RS) images pretrained on an annotated dataset would greatly decrease when testing on another unannotated dataset because of the domain gap.
Adversarial generative methods, e.g., DualGAN, are utilized for unpaired image-to-image translation to minimize the pixel-level domain gap.
In this paper, ResiDualGAN is proposed for RS images translation, where a resizer module is used for addressing the scale discrepancy of RS datasets.
arXiv Detail & Related papers (2022-01-27T13:56:54Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - G$^2$DA: Geometry-Guided Dual-Alignment Learning for RGB-Infrared Person
Re-Identification [3.909938091041451]
RGB-IR person re-identification aims to retrieve person-of-interest between heterogeneous modalities.
This paper presents a Geometry-Guided Dual-Alignment learning framework (G$2$DA) to tackle sample-level modality difference.
arXiv Detail & Related papers (2021-06-15T03:14:31Z) - Regressive Domain Adaptation for Unsupervised Keypoint Detection [67.2950306888855]
Domain adaptation (DA) aims at transferring knowledge from a labeled source domain to an unlabeled target domain.
We present a method of regressive domain adaptation (RegDA) for unsupervised keypoint detection.
Our method brings large improvement by 8% to 11% in terms of PCK on different datasets.
arXiv Detail & Related papers (2021-03-10T16:45:22Z) - Rethinking conditional GAN training: An approach using geometrically
structured latent manifolds [58.07468272236356]
Conditional GANs (cGAN) suffer from critical drawbacks such as the lack of diversity in generated outputs.
We propose a novel training mechanism that increases both the diversity and the visual quality of a vanilla cGAN.
arXiv Detail & Related papers (2020-11-25T22:54:11Z) - GAIT: Gradient Adjusted Unsupervised Image-to-Image Translation [5.076419064097734]
An adversarial loss is utilized to match the distributions of the translated and target image sets.
This may create artifacts if two domains have different marginal distributions, for example, in uniform areas.
We propose an unsupervised IIT that preserves the uniform regions after the translation.
arXiv Detail & Related papers (2020-09-02T08:04:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.