Harnessing the Conditioning Sensorium for Improved Image Translation
- URL: http://arxiv.org/abs/2110.06443v1
- Date: Wed, 13 Oct 2021 02:07:43 GMT
- Title: Harnessing the Conditioning Sensorium for Improved Image Translation
- Authors: Cooper Nederhood and Nicholas Kolkin and Deqing Fu and Jason Salavon
- Abstract summary: Multi-modal domain translation typically refers to a novel image that inherits certain localized attributes from a 'content' image.
We propose a new approach to learn disentangled 'content' and'style' representations from scratch.
We define 'content' based on conditioning information extracted by off-the-shelf pre-trained models.
We then train our style extractor and image decoder with an easy to optimize set of reconstruction objectives.
- Score: 2.9631016562930546
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multi-modal domain translation typically refers to synthesizing a novel image
that inherits certain localized attributes from a 'content' image (e.g. layout,
semantics, or geometry), and inherits everything else (e.g. texture, lighting,
sometimes even semantics) from a 'style' image. The dominant approach to this
task is attempting to learn disentangled 'content' and 'style' representations
from scratch. However, this is not only challenging, but ill-posed, as what
users wish to preserve during translation varies depending on their goals.
Motivated by this inherent ambiguity, we define 'content' based on conditioning
information extracted by off-the-shelf pre-trained models. We then train our
style extractor and image decoder with an easy to optimize set of
reconstruction objectives. The wide variety of high-quality pre-trained models
available and simple training procedure makes our approach straightforward to
apply across numerous domains and definitions of 'content'. Additionally it
offers intuitive control over which aspects of 'content' are preserved across
domains. We evaluate our method on traditional, well-aligned, datasets such as
CelebA-HQ, and propose two novel datasets for evaluation on more complex
scenes: ClassicTV and FFHQ-Wild. Our approach, Sensorium, enables higher
quality domain translation for more complex scenes.
Related papers
- Few-shot Image Generation via Style Adaptation and Content Preservation [60.08988307934977]
We introduce an image translation module to GAN transferring, where the module teaches the generator to separate style and content.
Our method consistently surpasses the state-of-the-art methods in few shot setting.
arXiv Detail & Related papers (2023-11-30T01:16:53Z) - Masked and Adaptive Transformer for Exemplar Based Image Translation [16.93344592811513]
Cross-domain semantic matching is challenging.
We propose a masked and adaptive transformer (MAT) for learning accurate cross-domain correspondence.
We devise a novel contrastive style learning method, for acquire quality-discriminative style representations.
arXiv Detail & Related papers (2023-03-30T03:21:14Z) - Unsupervised Image-to-Image Translation with Generative Prior [103.54337984566877]
Unsupervised image-to-image translation aims to learn the translation between two visual domains without paired data.
We present a novel framework, Generative Prior-guided UN Image-to-image Translation (GP-UNIT), to improve the overall quality and applicability of the translation algorithm.
arXiv Detail & Related papers (2022-04-07T17:59:23Z) - Fully Context-Aware Image Inpainting with a Learned Semantic Pyramid [102.24539566851809]
Restoring reasonable and realistic content for arbitrary missing regions in images is an important yet challenging task.
Recent image inpainting models have made significant progress in generating vivid visual details, but they can still lead to texture blurring or structural distortions.
We propose the Semantic Pyramid Network (SPN) motivated by the idea that learning multi-scale semantic priors can greatly benefit the recovery of locally missing content in images.
arXiv Detail & Related papers (2021-12-08T04:33:33Z) - Context-Aware Image Inpainting with Learned Semantic Priors [100.99543516733341]
We introduce pretext tasks that are semantically meaningful to estimating the missing contents.
We propose a context-aware image inpainting model, which adaptively integrates global semantics and local features.
arXiv Detail & Related papers (2021-06-14T08:09:43Z) - Controllable Person Image Synthesis with Spatially-Adaptive Warped
Normalization [72.65828901909708]
Controllable person image generation aims to produce realistic human images with desirable attributes.
We introduce a novel Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned flow-field to warp modulation parameters.
We propose a novel self-training part replacement strategy to refine the pretrained model for the texture-transfer task.
arXiv Detail & Related papers (2021-05-31T07:07:44Z) - Unpaired Image-to-Image Translation via Latent Energy Transport [61.62293304236371]
Image-to-image translation aims to preserve source contents while translating to discriminative target styles between two visual domains.
In this paper, we propose to deploy an energy-based model (EBM) in the latent space of a pretrained autoencoder for this task.
Our model is the first to be applicable to 1024$times$1024-resolution unpaired image translation.
arXiv Detail & Related papers (2020-12-01T17:18:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.