Layout-to-Image Translation with Double Pooling Generative Adversarial
Networks
- URL: http://arxiv.org/abs/2108.12900v1
- Date: Sun, 29 Aug 2021 19:55:14 GMT
- Title: Layout-to-Image Translation with Double Pooling Generative Adversarial
Networks
- Authors: Hao Tang, Nicu Sebe
- Abstract summary: We propose a novel Double Pooing GAN (DPGAN) for generating photo-realistic and semantically-consistent results from the input layout.
We also propose a novel Double Pooling Module (DPM), which consists of the Square-shape Pooling Module (SPM) and the Rectangle-shape Pooling Module ( RPM)
- Score: 76.83075646527521
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we address the task of layout-to-image translation, which aims
to translate an input semantic layout to a realistic image. One open challenge
widely observed in existing methods is the lack of effective semantic
constraints during the image translation process, leading to models that cannot
preserve the semantic information and ignore the semantic dependencies within
the same object. To address this issue, we propose a novel Double Pooing GAN
(DPGAN) for generating photo-realistic and semantically-consistent results from
the input layout. We also propose a novel Double Pooling Module (DPM), which
consists of the Square-shape Pooling Module (SPM) and the Rectangle-shape
Pooling Module (RPM). Specifically, SPM aims to capture short-range semantic
dependencies of the input layout with different spatial scales, while RPM aims
to capture long-range semantic dependencies from both horizontal and vertical
directions. We then effectively fuse both outputs of SPM and RPM to further
enlarge the receptive field of our generator. Extensive experiments on five
popular datasets show that the proposed DPGAN achieves better results than
state-of-the-art methods. Finally, both SPM and SPM are general and can be
seamlessly integrated into any GAN-based architectures to strengthen the
feature representation. The code is available at
https://github.com/Ha0Tang/DPGAN.
Related papers
- Layer-Wise Feature Metric of Semantic-Pixel Matching for Few-Shot Learning [14.627378118194933]
In Few-Shot Learning, traditional metric-based approaches often rely on global metrics to compute similarity.
In natural scenes, the spatial arrangement of key instances is often inconsistent across images.
We propose a novel method called the Layer-Wise Features Metric of Semantic-Pixel Matching to make finer comparisons.
arXiv Detail & Related papers (2024-11-10T05:12:24Z) - Learning Enriched Features via Selective State Spaces Model for Efficient Image Deblurring [0.0]
Image deblurring aims to restore a high-quality image from its corresponding blurred.
We propose an efficient image deblurring network that leverages selective state spaces model to aggregate enriched and accurate features.
Experimental results demonstrate that the proposed method outperforms state-of-the-art approaches on widely used benchmarks.
arXiv Detail & Related papers (2024-03-29T10:40:41Z) - Tolerating Annotation Displacement in Dense Object Counting via Point
Annotation Probability Map [25.203803417049528]
Counting objects in crowded scenes remains a challenge to computer vision.
We present a learning target point annotation probability map (PAPM)
We also propose an adaptively learned PAPM method (AL-PAPM)
arXiv Detail & Related papers (2023-07-29T04:46:21Z) - Diffusion Autoencoders: Toward a Meaningful and Decodable Representation [1.471992435706872]
Diffusion models (DPMs) have achieved remarkable quality in image generation that rivals GANs'.
Unlike GANs, DPMs use a set of latent variables that lack semantic meaning and cannot serve as a useful representation for other tasks.
This paper explores the possibility of using DPMs for representation learning and seeks to extract a meaningful and decodable representation of an input image via autoencoding.
arXiv Detail & Related papers (2021-11-30T18:24:04Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic
Segmentation [97.74059510314554]
Unsupervised domain adaptation (UDA) for semantic segmentation aims to adapt a segmentation model trained on the labeled source domain to the unlabeled target domain.
Existing methods try to learn domain invariant features while suffering from large domain gaps.
We propose a novel Dual Soft-Paste (DSP) method in this paper.
arXiv Detail & Related papers (2021-07-20T16:22:40Z) - BoundarySqueeze: Image Segmentation as Boundary Squeezing [104.43159799559464]
We propose a novel method for fine-grained high-quality image segmentation of both objects and scenes.
Inspired by dilation and erosion from morphological image processing techniques, we treat the pixel level segmentation problems as squeezing object boundary.
Our method yields large gains on COCO, Cityscapes, for both instance and semantic segmentation and outperforms previous state-of-the-art PointRend in both accuracy and speed under the same setting.
arXiv Detail & Related papers (2021-05-25T04:58:51Z) - Dual Attention GANs for Semantic Image Synthesis [101.36015877815537]
We propose a novel Dual Attention GAN (DAGAN) to synthesize photo-realistic and semantically-consistent images.
We also propose two novel modules, i.e., position-wise Spatial Attention Module (SAM) and scale-wise Channel Attention Module (CAM)
DAGAN achieves remarkably better results than state-of-the-art methods, while using fewer model parameters.
arXiv Detail & Related papers (2020-08-29T17:49:01Z) - Prototype Mixture Models for Few-shot Semantic Segmentation [50.866870384596446]
Few-shot segmentation is challenging because objects within the support and query images could significantly differ in appearance and pose.
We propose prototype mixture models (PMMs), which correlate diverse image regions with multiple prototypes to enforce the prototype-based semantic representation.
PMMs improve 5-shot segmentation performance on MS-COCO by up to 5.82% with only a moderate cost for model size and inference speed.
arXiv Detail & Related papers (2020-08-10T04:33:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.