Disentangled Unsupervised Image Translation via Restricted Information
Flow
- URL: http://arxiv.org/abs/2111.13279v1
- Date: Fri, 26 Nov 2021 00:27:54 GMT
- Title: Disentangled Unsupervised Image Translation via Restricted Information
Flow
- Authors: Ben Usman, Dina Bashkirova, Kate Saenko
- Abstract summary: Many state-of-art methods hard-code the desired shared-vs-specific split into their architecture.
We propose a new method that does not rely on inductive architectural biases.
We show that the proposed method achieves consistently high manipulation accuracy across two synthetic and one natural dataset.
- Score: 61.44666983942965
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unsupervised image-to-image translation methods aim to map images from one
domain into plausible examples from another domain while preserving structures
shared across two domains. In the many-to-many setting, an additional guidance
example from the target domain is used to determine domain-specific attributes
of the generated image. In the absence of attribute annotations, methods have
to infer which factors are specific to each domain from data during training.
Many state-of-art methods hard-code the desired shared-vs-specific split into
their architecture, severely restricting the scope of the problem. In this
paper, we propose a new method that does not rely on such inductive
architectural biases, and infers which attributes are domain-specific from data
by constraining information flow through the network using translation honesty
losses and a penalty on the capacity of domain-specific embedding. We show that
the proposed method achieves consistently high manipulation accuracy across two
synthetic and one natural dataset spanning a wide variety of domain-specific
and shared attributes.
Related papers
- WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization [63.98650220772378]
We present WIDIn, Wording Images for Domain-Invariant representation, to disentangle discriminative visual representation.
We first estimate the language embedding with fine-grained alignment, which can be used to adaptively identify and then remove domain-specific counterpart.
We show that WIDIn can be applied to both pretrained vision-language models like CLIP, and separately trained uni-modal models like MoCo and BERT.
arXiv Detail & Related papers (2024-05-28T17:46:27Z) - Domain Generalization by Learning and Removing Domain-specific Features [15.061481139046952]
Domain generalization aims to tackle this issue by learning a model that can generalize to unseen domains.
We propose a new approach that aims to explicitly remove domain-specific features for domain generalization.
We develop an encoder-decoder network to map each input image into a new image space where the learned domain-specific features are removed.
arXiv Detail & Related papers (2022-12-14T08:46:46Z) - Unsupervised Domain Adaptation for Semantic Segmentation using One-shot
Image-to-Image Translation via Latent Representation Mixing [9.118706387430883]
We propose a new unsupervised domain adaptation method for the semantic segmentation of very high resolution images.
An image-to-image translation paradigm is proposed, based on an encoder-decoder principle where latent content representations are mixed across domains.
Cross-city comparative experiments have shown that the proposed method outperforms state-of-the-art domain adaptation methods.
arXiv Detail & Related papers (2022-12-07T18:16:17Z) - AFAN: Augmented Feature Alignment Network for Cross-Domain Object
Detection [90.18752912204778]
Unsupervised domain adaptation for object detection is a challenging problem with many real-world applications.
We propose a novel augmented feature alignment network (AFAN) which integrates intermediate domain image generation and domain-adversarial training.
Our approach significantly outperforms the state-of-the-art methods on standard benchmarks for both similar and dissimilar domain adaptations.
arXiv Detail & Related papers (2021-06-10T05:01:20Z) - DRANet: Disentangling Representation and Adaptation Networks for
Unsupervised Cross-Domain Adaptation [23.588766224169493]
DRANet is a network architecture that disentangles image representations and transfers the visual attributes in a latent space for unsupervised cross-domain adaptation.
Our model encodes individual representations of content (scene structure) and style (artistic appearance) from both source and target images.
It adapts the domain by incorporating the transferred style factor into the content factor along with learnable weights specified for each domain.
arXiv Detail & Related papers (2021-03-24T18:54:23Z) - SMILE: Semantically-guided Multi-attribute Image and Layout Editing [154.69452301122175]
Attribute image manipulation has been a very active topic since the introduction of Generative Adversarial Networks (GANs)
We present a multimodal representation that handles all attributes, be it guided by random noise or images, while only using the underlying domain information of the target domain.
Our method is capable of adding, removing or changing either fine-grained or coarse attributes by using an image as a reference or by exploring the style distribution space.
arXiv Detail & Related papers (2020-10-05T20:15:21Z) - Crossing-Domain Generative Adversarial Networks for Unsupervised
Multi-Domain Image-to-Image Translation [12.692904507625036]
We propose a general framework for unsupervised image-to-image translation across multiple domains.
Our proposed framework consists of a pair of encoders along with a pair of GANs which learns high-level features across different domains to generate diverse and realistic samples from.
arXiv Detail & Related papers (2020-08-27T01:54:07Z) - Differential Treatment for Stuff and Things: A Simple Unsupervised
Domain Adaptation Method for Semantic Segmentation [105.96860932833759]
State-of-the-art approaches prove that performing semantic-level alignment is helpful in tackling the domain shift issue.
We propose to improve the semantic-level alignment with different strategies for stuff regions and for things.
In addition to our proposed method, we show that our method can help ease this issue by minimizing the most similar stuff and instance features between the source and the target domains.
arXiv Detail & Related papers (2020-03-18T04:43:25Z) - Latent Normalizing Flows for Many-to-Many Cross-Domain Mappings [76.85673049332428]
Learned joint representations of images and text form the backbone of several important cross-domain tasks such as image captioning.
We propose a novel semi-supervised framework, which models shared information between domains and domain-specific information separately.
We demonstrate the effectiveness of our model on diverse tasks, including image captioning and text-to-image synthesis.
arXiv Detail & Related papers (2020-02-16T19:49:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.