High-Quality Entity Segmentation
- URL: http://arxiv.org/abs/2211.05776v3
- Date: Sun, 2 Apr 2023 22:01:17 GMT
- Title: High-Quality Entity Segmentation
- Authors: Lu Qi, Jason Kuen, Weidong Guo, Tiancheng Shen, Jiuxiang Gu, Jiaya
Jia, Zhe Lin, Ming-Hsuan Yang
- Abstract summary: CropFormer is designed to tackle the intractability of instance-level segmentation on high-resolution images.
It improves mask prediction by fusing high-res image crops that provide more fine-grained image details and the full image.
With CropFormer, we achieve a significant AP gain of $1.9$ on the challenging entity segmentation task.
- Score: 110.55724145851725
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Dense image segmentation tasks e.g., semantic, panoptic) are useful for image
editing, but existing methods can hardly generalize well in an in-the-wild
setting where there are unrestricted image domains, classes, and image
resolution and quality variations. Motivated by these observations, we
construct a new entity segmentation dataset, with a strong focus on
high-quality dense segmentation in the wild. The dataset contains images
spanning diverse image domains and entities, along with plentiful
high-resolution images and high-quality mask annotations for training and
testing. Given the high-quality and -resolution nature of the dataset, we
propose CropFormer which is designed to tackle the intractability of
instance-level segmentation on high-resolution images. It improves mask
prediction by fusing high-res image crops that provide more fine-grained image
details and the full image. CropFormer is the first query-based Transformer
architecture that can effectively fuse mask predictions from multiple image
views, by learning queries that effectively associate the same entities across
the full image and its crop. With CropFormer, we achieve a significant AP gain
of $1.9$ on the challenging entity segmentation task. Furthermore, CropFormer
consistently improves the accuracy of traditional segmentation tasks and
datasets. The dataset and code will be released at
http://luqi.info/entityv2.github.io/.
Related papers
- UniGS: Unified Representation for Image Generation and Segmentation [105.08152635402858]
We use a colormap to represent entity-level masks, addressing the challenge of varying entity numbers.
Two novel modules, including the location-aware color palette and progressive dichotomy module, are proposed to support our mask representation.
arXiv Detail & Related papers (2023-12-04T15:59:27Z) - A Lightweight Clustering Framework for Unsupervised Semantic
Segmentation [28.907274978550493]
Unsupervised semantic segmentation aims to categorize each pixel in an image into a corresponding class without the use of annotated data.
We propose a lightweight clustering framework for unsupervised semantic segmentation.
Our framework achieves state-of-the-art results on PASCAL VOC and MS COCO datasets.
arXiv Detail & Related papers (2023-11-30T15:33:42Z) - Self-supervised Scene Text Segmentation with Object-centric Layered
Representations Augmented by Text Regions [22.090074821554754]
We propose a self-supervised scene text segmentation algorithm with layered decoupling of representations derived from the object-centric manner to segment images into texts and background.
On several public scene text datasets, our method outperforms the state-of-the-art unsupervised segmentation algorithms.
arXiv Detail & Related papers (2023-08-25T05:00:05Z) - ReFit: A Framework for Refinement of Weakly Supervised Semantic
Segmentation using Object Border Fitting for Medical Images [4.945138408504987]
Weakly Supervised Semantic (WSSS) relying only on image-level supervision is a promising approach to deal with the need for networks.
We propose our novel ReFit framework, which deploys state-of-the-art class activation maps combined with various post-processing techniques.
By applying our method to WSSS predictions, we achieved up to 10% improvement over the current state-of-the-art WSSS methods for medical imaging.
arXiv Detail & Related papers (2023-03-14T12:46:52Z) - Open-World Entity Segmentation [70.41548013910402]
We introduce a new image segmentation task, termed Entity (ES) with the aim to segment all visual entities in an image without considering semantic category labels.
All semantically-meaningful segments are equally treated as categoryless entities and there is no thing-stuff distinction.
ES enables the following: (1) merging multiple datasets to form a large training set without the need to resolve label conflicts; (2) any model trained on one dataset can generalize exceptionally well to other datasets with unseen domains.
arXiv Detail & Related papers (2021-07-29T17:59:05Z) - Segmenter: Transformer for Semantic Segmentation [79.9887988699159]
We introduce Segmenter, a transformer model for semantic segmentation.
We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation.
It outperforms the state of the art on the challenging ADE20K dataset and performs on-par on Pascal Context and Cityscapes.
arXiv Detail & Related papers (2021-05-12T13:01:44Z) - DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort [117.41383937100751]
Current deep networks are extremely data-hungry, benefiting from training on large-scale datasets.
We show how the GAN latent code can be decoded to produce a semantic segmentation of the image.
These generated datasets can then be used for training any computer vision architecture just as real datasets are.
arXiv Detail & Related papers (2021-04-13T20:08:29Z) - Semantic Segmentation with Generative Models: Semi-Supervised Learning
and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.
We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images.
We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z) - Meticulous Object Segmentation [37.48446050876045]
We propose and study a task named Meticulous Object segmentation (MOS)
MeticulousNet leverages a dedicated decoder to capture the object boundary details.
We provide empirical evidence showing that MeticulousNet can reveal pixel-accurate segmentation boundaries.
arXiv Detail & Related papers (2020-12-13T23:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.