Semantically Multi-modal Image Synthesis
- URL: http://arxiv.org/abs/2003.12697v3
- Date: Thu, 2 Apr 2020 09:07:29 GMT
- Title: Semantically Multi-modal Image Synthesis
- Authors: Zhen Zhu, Zhiliang Xu, Ansheng You, Xiang Bai
- Abstract summary: We focus on semantically multi-modal image synthesis (SMIS) task, namely, generating multi-modal images at the semantic level.
We propose a novel Group Decreasing Network (GroupDNet) that leverages group convolutions in the generator and progressively decreases the group numbers of the convolutions in the decoder.
GroupDNet is armed with much more controllability on translating semantic labels to natural images and has plausible high-quality yields for datasets with many classes.
- Score: 58.87967932525891
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we focus on semantically multi-modal image synthesis (SMIS)
task, namely, generating multi-modal images at the semantic level. Previous
work seeks to use multiple class-specific generators, constraining its usage in
datasets with a small number of classes. We instead propose a novel Group
Decreasing Network (GroupDNet) that leverages group convolutions in the
generator and progressively decreases the group numbers of the convolutions in
the decoder. Consequently, GroupDNet is armed with much more controllability on
translating semantic labels to natural images and has plausible high-quality
yields for datasets with many classes. Experiments on several challenging
datasets demonstrate the superiority of GroupDNet on performing the SMIS task.
We also show that GroupDNet is capable of performing a wide range of
interesting synthesis applications. Codes and models are available at:
https://github.com/Seanseattle/SMIS.
Related papers
- GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition [37.02054260449195]
Multi-Label Image Recognition (MLIR) is a challenging task that aims to predict multiple object labels in a single image.
We present the first fully graph convolutional model, Group K-nearest neighbor based Graph convolutional Network (GKGNet)
Our experiments demonstrate that GKGNet achieves state-of-the-art performance with significantly lower computational costs.
arXiv Detail & Related papers (2023-08-28T07:50:04Z) - An Efficient General-Purpose Modular Vision Model via Multi-Task
Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z) - Extracting Semantic Knowledge from GANs with Unsupervised Learning [65.32631025780631]
Generative Adversarial Networks (GANs) encode semantics in feature maps in a linearly separable form.
We propose a novel clustering algorithm, named KLiSH, which leverages the linear separability to cluster GAN's features.
KLiSH succeeds in extracting fine-grained semantics of GANs trained on datasets of various objects.
arXiv Detail & Related papers (2022-11-30T03:18:16Z) - One-Shot Synthesis of Images and Segmentation Masks [28.119303696418882]
Joint synthesis of images and segmentation masks with generative adversarial networks (GANs) is promising to reduce the effort needed for collecting image data with pixel-wise annotations.
To learn high-fidelity image-mask synthesis, existing GAN approaches first need a pre-training phase requiring large amounts of image data.
We introduce our OSMIS model which enables the synthesis of segmentation masks that are precisely aligned to the generated images in the one-shot regime.
arXiv Detail & Related papers (2022-09-15T18:00:55Z) - BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations [89.42397034542189]
We synthesize a large labeled dataset via a generative adversarial network (GAN)
We take image samples from the class-conditional generative model BigGAN trained on ImageNet, and manually annotate 5 images per class, for all 1k classes.
We create a new ImageNet benchmark by labeling an additional set of 8k real images and evaluate segmentation performance in a variety of settings.
arXiv Detail & Related papers (2022-01-12T20:28:34Z) - Learning Multi-Attention Context Graph for Group-Based Re-Identification [214.84551361855443]
Learning to re-identify or retrieve a group of people across non-overlapped camera systems has important applications in video surveillance.
In this work, we consider employing context information for identifying groups of people, i.e., group re-id.
We propose a novel unified framework based on graph neural networks to simultaneously address the group-based re-id tasks.
arXiv Detail & Related papers (2021-04-29T09:57:47Z) - DoDNet: Learning to segment multi-organ and tumors from multiple
partially labeled datasets [102.55303521877933]
We propose a dynamic on-demand network (DoDNet) that learns to segment multiple organs and tumors on partially labelled datasets.
DoDNet consists of a shared encoder-decoder architecture, a task encoding module, a controller for generating dynamic convolution filters, and a single but dynamic segmentation head.
arXiv Detail & Related papers (2020-11-20T04:56:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.