Memory-guided Unsupervised Image-to-image Translation
- URL: http://arxiv.org/abs/2104.05170v1
- Date: Mon, 12 Apr 2021 03:02:51 GMT
- Title: Memory-guided Unsupervised Image-to-image Translation
- Authors: Somi Jeong, Youngjung Kim, Eungbean Lee, Kwanghoon Sohn
- Abstract summary: We present an unsupervised framework for instance-level image-to-image translation.
We show that our model outperforms recent instance-level methods.
- Score: 54.1903150849536
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a novel unsupervised framework for instance-level image-to-image
translation. Although recent advances have been made by incorporating
additional object annotations, existing methods often fail to handle images
with multiple disparate objects. The main cause is that, during inference, they
apply a global style to the whole image and do not consider the large style
discrepancy between instance and background, or within instances. To address
this problem, we propose a class-aware memory network that explicitly reasons
about local style variations. A key-values memory structure, with a set of
read/update operations, is introduced to record class-wise style variations and
access them without requiring an object detector at the test time. The key
stores a domain-agnostic content representation for allocating memory items,
while the values encode domain-specific style representations. We also present
a feature contrastive loss to boost the discriminative power of memory items.
We show that by incorporating our memory, we can transfer class-aware and
accurate style representations across domains. Experimental results demonstrate
that our model outperforms recent instance-level methods and achieves
state-of-the-art performance.
Related papers
- Improving Object Detection via Local-global Contrastive Learning [27.660633883387753]
We present a novel image-to-image translation method that specifically targets cross-domain object detection.
We learn to represent objects by contrasting local-global information.
This affords investigation of an under-explored challenge: obtaining performant detection, under domain shifts.
arXiv Detail & Related papers (2024-10-07T14:18:32Z) - Towards Image Semantics and Syntax Sequence Learning [8.033697392628424]
We introduce the concept of "image grammar", consisting of "image semantics" and "image syntax"
We propose a weakly supervised two-stage approach to learn the image grammar relative to a class of visual objects/scenes.
Our framework is trained to reason over patch semantics and detect faulty syntax.
arXiv Detail & Related papers (2024-01-31T00:16:02Z) - Leveraging Open-Vocabulary Diffusion to Camouflaged Instance
Segmentation [59.78520153338878]
Text-to-image diffusion techniques have shown exceptional capability of producing high-quality images from text descriptions.
We propose a method built upon a state-of-the-art diffusion model, empowered by open-vocabulary to learn multi-scale textual-visual features for camouflaged object representations.
arXiv Detail & Related papers (2023-12-29T07:59:07Z) - Open Compound Domain Adaptation with Object Style Compensation for
Semantic Segmentation [23.925791263194622]
This paper proposes the Object Style Compensation, where we construct the Object-Level Discrepancy Memory.
We learn the discrepancy features from the images of source and target domains, storing the discrepancy features in memory.
Our method enables a more accurate computation of the pseudo annotations for target domain's images, thus yielding state-of-the-art results on different datasets.
arXiv Detail & Related papers (2023-09-28T03:15:47Z) - Improving Image Recognition by Retrieving from Web-Scale Image-Text Data [68.63453336523318]
We introduce an attention-based memory module, which learns the importance of each retrieved example from the memory.
Compared to existing approaches, our method removes the influence of the irrelevant retrieved examples, and retains those that are beneficial to the input query.
We show that it achieves state-of-the-art accuracies in ImageNet-LT, Places-LT and Webvision datasets.
arXiv Detail & Related papers (2023-04-11T12:12:05Z) - Disentangled Unsupervised Image Translation via Restricted Information
Flow [61.44666983942965]
Many state-of-art methods hard-code the desired shared-vs-specific split into their architecture.
We propose a new method that does not rely on inductive architectural biases.
We show that the proposed method achieves consistently high manipulation accuracy across two synthetic and one natural dataset.
arXiv Detail & Related papers (2021-11-26T00:27:54Z) - Improving Few-shot Learning with Weakly-supervised Object Localization [24.3569501375842]
We propose a novel framework that generates class representations by extracting features from class-relevant regions of the images.
Our method outperforms the baseline few-shot model in miniImageNet and tieredImageNet benchmarks.
arXiv Detail & Related papers (2021-05-25T07:39:32Z) - Instance Localization for Self-supervised Detection Pretraining [68.24102560821623]
We propose a new self-supervised pretext task, called instance localization.
We show that integration of bounding boxes into pretraining promotes better task alignment and architecture alignment for transfer learning.
Experimental results demonstrate that our approach yields state-of-the-art transfer learning results for object detection.
arXiv Detail & Related papers (2021-02-16T17:58:57Z) - COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content
Conditioned Style Encoder [70.23358875904891]
Unsupervised image-to-image translation aims to learn a mapping of an image in a given domain to an analogous image in a different domain.
We propose a new few-shot image translation model, COCO-FUNIT, which computes the style embedding of the example images conditioned on the input image.
Our model shows effectiveness in addressing the content loss problem.
arXiv Detail & Related papers (2020-07-15T02:01:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.