Panoptic-based Object Style-Align for Image-to-Image Translation
- URL: http://arxiv.org/abs/2112.01926v1
- Date: Fri, 3 Dec 2021 14:28:11 GMT
- Title: Panoptic-based Object Style-Align for Image-to-Image Translation
- Authors: Liyun Zhang, Photchara Ratsamee, Bowen Wang, Manabu Higashida, Yuki
Uranishi, Haruo Takemura
- Abstract summary: We propose panoptic-based object style-align generative adversarial networks (POSA-GANs) for image-to-image translation.
The proposed method was systematically compared with different competing methods and obtained significant improvement on both image quality and object recognition performance for translated images.
- Score: 2.226472061870956
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite remarkable recent progress in image translation, the complex scene
with multiple discrepant objects remains a challenging problem. Because the
translated images have low fidelity and tiny objects in fewer details and
obtain unsatisfactory performance in object recognition. Without the thorough
object perception (i.e., bounding boxes, categories, and masks) of the image as
prior knowledge, the style transformation of each object will be difficult to
track in the image translation process. We propose panoptic-based object
style-align generative adversarial networks (POSA-GANs) for image-to-image
translation together with a compact panoptic segmentation dataset. The panoptic
segmentation model is utilized to extract panoptic-level perception (i.e.,
overlap-removed foreground object instances and background semantic regions in
the image). This is utilized to guide the alignment between the object content
codes of the input domain image and object style codes sampled from the style
space of the target domain. The style-aligned object representations are
further transformed to obtain precise boundaries layout for higher fidelity
object generation. The proposed method was systematically compared with
different competing methods and obtained significant improvement on both image
quality and object recognition performance for translated images.
Related papers
- Improving Object Detection via Local-global Contrastive Learning [27.660633883387753]
We present a novel image-to-image translation method that specifically targets cross-domain object detection.
We learn to represent objects by contrasting local-global information.
This affords investigation of an under-explored challenge: obtaining performant detection, under domain shifts.
arXiv Detail & Related papers (2024-10-07T14:18:32Z) - Enriching Phrases with Coupled Pixel and Object Contexts for Panoptic
Narrative Grounding [43.657151728626125]
Panoptic narrative grounding aims to segment things and stuff objects in an image described by noun phrases of a narrative caption.
We propose a Phrase-Pixel-Object Transformer Decoder (PPO-TD) to enrich phrases with coupled pixel and object contexts.
Our method achieves new state-of-the-art performance with large margins.
arXiv Detail & Related papers (2023-11-02T08:55:28Z) - Open Compound Domain Adaptation with Object Style Compensation for
Semantic Segmentation [23.925791263194622]
This paper proposes the Object Style Compensation, where we construct the Object-Level Discrepancy Memory.
We learn the discrepancy features from the images of source and target domains, storing the discrepancy features in memory.
Our method enables a more accurate computation of the pseudo annotations for target domain's images, thus yielding state-of-the-art results on different datasets.
arXiv Detail & Related papers (2023-09-28T03:15:47Z) - SIEDOB: Semantic Image Editing by Disentangling Object and Background [5.149242555705579]
We propose a novel paradigm for semantic image editing.
textbfSIEDOB, the core idea of which is to explicitly leverage several heterogeneousworks for objects and backgrounds.
We conduct extensive experiments on Cityscapes and ADE20K-Room datasets and exhibit that our method remarkably outperforms the baselines.
arXiv Detail & Related papers (2023-03-23T06:17:23Z) - Localizing Object-level Shape Variations with Text-to-Image Diffusion
Models [60.422435066544814]
We present a technique to generate a collection of images that depicts variations in the shape of a specific object.
A particular challenge when generating object variations is accurately localizing the manipulation applied over the object's shape.
To localize the image-space operation, we present two techniques that use the self-attention layers in conjunction with the cross-attention layers.
arXiv Detail & Related papers (2023-03-20T17:45:08Z) - CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for
Referring Image Segmentation [104.5033800500497]
Referring image segmentation aims at localizing all pixels of the visual objects described by a natural language sentence.
Previous works learn to straightforwardly align the sentence embedding and pixel-level embedding for highlighting the referred objects.
We propose CoupAlign, a simple yet effective multi-level visual-semantic alignment method.
arXiv Detail & Related papers (2022-12-04T08:53:42Z) - DALL-E for Detection: Language-driven Context Image Synthesis for Object
Detection [18.276823176045525]
We propose a new paradigm for automatic context image generation at scale.
At the core of our approach lies utilizing an interplay between language description of context and language-driven image generation.
We demonstrate the advantages of our approach over the prior context image generation approaches on four object detection datasets.
arXiv Detail & Related papers (2022-06-20T06:43:17Z) - A Simple and Effective Use of Object-Centric Images for Long-Tailed
Object Detection [56.82077636126353]
We take advantage of object-centric images to improve object detection in scene-centric images.
We present a simple yet surprisingly effective framework to do so.
Our approach can improve the object detection (and instance segmentation) accuracy of rare objects by 50% (and 33%) relatively.
arXiv Detail & Related papers (2021-02-17T17:27:21Z) - Improving Semantic Segmentation via Decoupled Body and Edge Supervision [89.57847958016981]
Existing semantic segmentation approaches either aim to improve the object's inner consistency by modeling the global context, or refine objects detail along their boundaries by multi-scale feature fusion.
In this paper, a new paradigm for semantic segmentation is proposed.
Our insight is that appealing performance of semantic segmentation requires textitexplicitly modeling the object textitbody and textitedge, which correspond to the high and low frequency of the image.
We show that the proposed framework with various baselines or backbone networks leads to better object inner consistency and object boundaries.
arXiv Detail & Related papers (2020-07-20T12:11:22Z) - Cross-domain Correspondence Learning for Exemplar-based Image
Translation [59.35767271091425]
We present a framework for exemplar-based image translation, which synthesizes a photo-realistic image from the input in a distinct domain.
The output has the style (e.g., color, texture) in consistency with the semantically corresponding objects in the exemplar.
We show that our method is superior to state-of-the-art methods in terms of image quality significantly.
arXiv Detail & Related papers (2020-04-12T09:10:57Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.