Multi-Domain Image-to-Image Translation with Adaptive Inference Graph
- URL: http://arxiv.org/abs/2101.03806v1
- Date: Mon, 11 Jan 2021 10:47:29 GMT
- Title: Multi-Domain Image-to-Image Translation with Adaptive Inference Graph
- Authors: The-Phuc Nguyen, St\'ephane Lathuili\`ere, Elisa Ricci
- Abstract summary: Current state of the art models require a large and deep model to handle the visual diversity of multiple domains.
We propose to increase the network capacity by using an adaptive graph structure.
This approach leads to an adjustable increase in number of parameters while preserving an almost constant computational cost.
- Score: 29.673550911992365
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we address the problem of multi-domain image-to-image
translation with particular attention paid to computational cost. In
particular, current state of the art models require a large and deep model in
order to handle the visual diversity of multiple domains. In a context of
limited computational resources, increasing the network size may not be
possible. Therefore, we propose to increase the network capacity by using an
adaptive graph structure. At inference time, the network estimates its own
graph by selecting specific sub-networks. Sub-network selection is implemented
using Gumbel-Softmax in order to allow end-to-end training. This approach leads
to an adjustable increase in number of parameters while preserving an almost
constant computational cost. Our evaluation on two publicly available datasets
of facial and painting images shows that our adaptive strategy generates better
images with fewer artifacts than literature methods
Related papers
- DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles [19.096747443000194]
We propose DoraCycle, which integrates two multimodal cycles: text-to-image-to-text and image-to-text-to-image.
The model is optimized through cross-entropy loss computed at the cycle endpoints, where both endpoints share the same modality.
For tasks involving new paired knowledge, such as specific identities, a combination of a small set of paired image-text examples and larger-scale unpaired data is sufficient.
arXiv Detail & Related papers (2025-03-05T16:26:58Z) - SCONE-GAN: Semantic Contrastive learning-based Generative Adversarial
Network for an end-to-end image translation [18.93434486338439]
SCONE-GAN is shown to be effective for learning to generate realistic and diverse scenery images.
For more realistic and diverse image generation we introduce style reference image.
We validate the proposed algorithm for image-to-image translation and stylizing outdoor images.
arXiv Detail & Related papers (2023-11-07T10:29:16Z) - T-former: An Efficient Transformer for Image Inpainting [50.43302925662507]
A class of attention-based network architectures, called transformer, has shown significant performance on natural language processing fields.
In this paper, we design a novel attention linearly related to the resolution according to Taylor expansion, and based on this attention, a network called $T$-former is designed for image inpainting.
Experiments on several benchmark datasets demonstrate that our proposed method achieves state-of-the-art accuracy while maintaining a relatively low number of parameters and computational complexity.
arXiv Detail & Related papers (2023-05-12T04:10:42Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Self-Supervised Learning of Domain Invariant Features for Depth
Estimation [35.74969527929284]
We tackle the problem of unsupervised synthetic-to-realistic domain adaptation for single image depth estimation.
An essential building block of single image depth estimation is an encoder-decoder task network that takes RGB images as input and produces depth maps as output.
We propose a novel training strategy to force the task network to learn domain invariant representations in a self-supervised manner.
arXiv Detail & Related papers (2021-06-04T16:45:48Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks [78.65792427542672]
Dynamic Graph Network (DG-Net) is a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent connection paths.
Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability.
arXiv Detail & Related papers (2020-10-02T16:50:26Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - CRNet: Cross-Reference Networks for Few-Shot Segmentation [59.85183776573642]
Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images.
With a cross-reference mechanism, our network can better find the co-occurrent objects in the two images.
Experiments on the PASCAL VOC 2012 dataset show that our network achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-03-24T04:55:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.