UniGS: Unified Representation for Image Generation and Segmentation
- URL: http://arxiv.org/abs/2312.01985v1
- Date: Mon, 4 Dec 2023 15:59:27 GMT
- Title: UniGS: Unified Representation for Image Generation and Segmentation
- Authors: Lu Qi, Lehan Yang, Weidong Guo, Yu Xu, Bo Du, Varun Jampani,
Ming-Hsuan Yang
- Abstract summary: We use a colormap to represent entity-level masks, addressing the challenge of varying entity numbers.
Two novel modules, including the location-aware color palette and progressive dichotomy module, are proposed to support our mask representation.
- Score: 105.08152635402858
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper introduces a novel unified representation of diffusion models for
image generation and segmentation. Specifically, we use a colormap to represent
entity-level masks, addressing the challenge of varying entity numbers while
aligning the representation closely with the image RGB domain. Two novel
modules, including the location-aware color palette and progressive dichotomy
module, are proposed to support our mask representation. On the one hand, a
location-aware palette guarantees the colors' consistency to entities'
locations. On the other hand, the progressive dichotomy module can efficiently
decode the synthesized colormap to high-quality entity-level masks in a
depth-first binary search without knowing the cluster numbers. To tackle the
issue of lacking large-scale segmentation training data, we employ an
inpainting pipeline and then improve the flexibility of diffusion models across
various tasks, including inpainting, image synthesis, referring segmentation,
and entity segmentation. Comprehensive experiments validate the efficiency of
our approach, demonstrating comparable segmentation mask quality to
state-of-the-art and adaptability to multiple tasks. The code will be released
at \href{https://github.com/qqlu/Entity}{https://github.com/qqlu/Entity}.
Related papers
- IIDM: Image-to-Image Diffusion Model for Semantic Image Synthesis [8.080248399002663]
In this paper, semantic image synthesis is treated as an image denoising task.
The style reference is first contaminated with random noise and then progressively denoised by IIDM.
Three techniques, refinement, color-transfer and model ensembles are proposed to further boost the generation quality.
arXiv Detail & Related papers (2024-03-20T08:21:00Z) - Generalizable Entity Grounding via Assistance of Large Language Model [77.07759442298666]
We propose a novel approach to densely ground visual entities from a long caption.
We leverage a large multimodal model to extract semantic nouns, a class-a segmentation model to generate entity-level segmentation, and a multi-modal feature fusion module to associate each semantic noun with its corresponding segmentation mask.
arXiv Detail & Related papers (2024-02-04T16:06:05Z) - High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation [17.804090651425955]
Image-level weakly-supervised segmentation (WSSS) reduces the usually vast data annotation cost by surrogate segmentation masks during training.
Our work is based on two techniques for improving CAMs; importance sampling, which is a substitute for GAP, and the feature similarity loss.
We reformulate both techniques based on binomial posteriors of multiple independent binary problems.
This has two benefits; their performance is improved and they become more general, resulting in an add-on method that can boost virtually any WSSS method.
arXiv Detail & Related papers (2023-04-05T17:43:57Z) - High-Quality Entity Segmentation [110.55724145851725]
CropFormer is designed to tackle the intractability of instance-level segmentation on high-resolution images.
It improves mask prediction by fusing high-res image crops that provide more fine-grained image details and the full image.
With CropFormer, we achieve a significant AP gain of $1.9$ on the challenging entity segmentation task.
arXiv Detail & Related papers (2022-11-10T18:58:22Z) - Few-shot semantic segmentation via mask aggregation [5.886986014593717]
Few-shot semantic segmentation aims to recognize novel classes with only very few labelled data.
Previous works have typically regarded it as a pixel-wise classification problem.
We introduce a mask-based classification method for addressing this problem.
arXiv Detail & Related papers (2022-02-15T07:13:09Z) - Segmenter: Transformer for Semantic Segmentation [79.9887988699159]
We introduce Segmenter, a transformer model for semantic segmentation.
We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation.
It outperforms the state of the art on the challenging ADE20K dataset and performs on-par on Pascal Context and Cityscapes.
arXiv Detail & Related papers (2021-05-12T13:01:44Z) - Semantic Segmentation with Generative Models: Semi-Supervised Learning
and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.
We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images.
We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z) - Free-Form Image Inpainting via Contrastive Attention Network [64.05544199212831]
In image inpainting tasks, masks with any shapes can appear anywhere in images which form complex patterns.
It is difficult for encoders to capture such powerful representations under this complex situation.
We propose a self-supervised Siamese inference network to improve the robustness and generalization.
arXiv Detail & Related papers (2020-10-29T14:46:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.