Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision
- URL: http://arxiv.org/abs/2303.05503v2
- Date: Tue, 14 May 2024 03:32:12 GMT
- Title: Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision
- Authors: Tarun Kalluri, Weiyao Wang, Heng Wang, Manmohan Chandraker, Lorenzo Torresani, Du Tran,
- Abstract summary: We propose a novel approach for open world instance segmentation called bottom-Up and top-Down Open-world (UDOS)
UDOS first predicts parts of objects using a top-down network trained with weak supervision from bottom-up segmentations.
UDOS enjoys both the speed and efficiency from the topdown architectures and the ability to unseen categories from bottom-up supervision.
- Score: 83.57156368908836
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many top-down architectures for instance segmentation achieve significant success when trained and tested on pre-defined closed-world taxonomy. However, when deployed in the open world, they exhibit notable bias towards seen classes and suffer from significant performance drop. In this work, we propose a novel approach for open world instance segmentation called bottom-Up and top-Down Open-world Segmentation (UDOS) that combines classical bottom-up segmentation algorithms within a top-down learning framework. UDOS first predicts parts of objects using a top-down network trained with weak supervision from bottom-up segmentations. The bottom-up segmentations are class-agnostic and do not overfit to specific taxonomies. The part-masks are then fed into affinity-based grouping and refinement modules to predict robust instance-level segmentations. UDOS enjoys both the speed and efficiency from the top-down architectures and the generalization ability to unseen categories from bottom-up supervision. We validate the strengths of UDOS on multiple cross-category as well as cross-dataset transfer tasks from 5 challenging datasets including MS-COCO, LVIS, ADE20k, UVO and OpenImages, achieving significant improvements over state-of-the-art across the board. Our code and models are available on our project page.
Related papers
- Instance Segmentation under Occlusions via Location-aware Copy-Paste
Data Augmentation [8.335108002480068]
MMSports 2023 DeepSportRadar has introduced a dataset that focuses on segmenting human subjects within a basketball context.
This challenge demands the application of robust data augmentation techniques and wisely-chosen deep learning architectures.
Our work (ranked 1st in the competition) first proposes a novel data augmentation technique, capable of generating more training samples with wider distribution.
arXiv Detail & Related papers (2023-10-27T07:44:25Z) - Towards Universal Vision-language Omni-supervised Segmentation [72.31277932442988]
We present Vision-Language Omni-Supervised (VLOSS) to treat open-world segmentation tasks as proposal classification.
We leverage omni-supervised data (i.e., panoptic segmentation data, object detection data, and image-text pairs data) into training, thus enriching the open-world segmentation ability.
With fewer parameters, our VLOSS with Swin-Tiny surpasses MaskCLIP by 2% in terms of mask AP on LVIS v1 dataset.
arXiv Detail & Related papers (2023-03-12T02:57:53Z) - Betrayed by Captions: Joint Caption Grounding and Generation for Open
Vocabulary Instance Segmentation [80.48979302400868]
We focus on open vocabulary instance segmentation to expand a segmentation model to classify and segment instance-level novel categories.
Previous approaches have relied on massive caption datasets and complex pipelines to establish one-to-one mappings between image regions and captions in nouns.
We devise a joint textbfCaption Grounding and Generation (CGG) framework, which incorporates a novel grounding loss that only focuses on matching object to improve learning efficiency.
arXiv Detail & Related papers (2023-01-02T18:52:12Z) - Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From
Learned Pairwise Affinity [59.1823948436411]
We propose a novel approach for mask proposals, Generic Grouping Networks (GGNs)
Our approach combines a local measure of pixel affinity with instance-level mask supervision, producing a training regimen designed to make the model as generic as the data diversity allows.
arXiv Detail & Related papers (2022-04-12T22:37:49Z) - Fully Self-Supervised Learning for Semantic Segmentation [46.6602159197283]
We present a fully self-supervised framework for semantic segmentation(FS4).
We propose a bootstrapped training scheme for semantic segmentation, which fully leveraged the global semantic knowledge for self-supervision.
We evaluate our method on the large-scale COCO-Stuff dataset and achieved 7.19 mIoU improvements on both things and stuff objects.
arXiv Detail & Related papers (2022-02-24T09:38:22Z) - TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic
Segmentation [44.75300205362518]
Unsupervised semantic segmentation aims to obtain high-level semantic representation on low-level visual features without manual annotations.
We propose the first top-down unsupervised semantic segmentation framework for fine-grained segmentation in extremely complicated scenarios.
Our results show that our top-down unsupervised segmentation is robust to both object-centric and scene-centric datasets.
arXiv Detail & Related papers (2021-12-02T18:59:03Z) - Exemplar-Based Open-Set Panoptic Segmentation Network [79.99748041746592]
We extend panoptic segmentation to the open-world and introduce an open-set panoptic segmentation (OPS) task.
We investigate the practical challenges of the task and construct a benchmark on top of an existing dataset, COCO.
We propose a novel exemplar-based open-set panoptic segmentation network (EOPSN) inspired by exemplar theory.
arXiv Detail & Related papers (2021-05-18T07:59:21Z) - Class-wise Dynamic Graph Convolution for Semantic Segmentation [63.08061813253613]
We propose a class-wise dynamic graph convolution (CDGC) module to adaptively propagate information.
We also introduce the Class-wise Dynamic Graph Convolution Network(CDGCNet), which consists of two main parts including the CDGC module and a basic segmentation network.
We conduct extensive experiments on three popular semantic segmentation benchmarks including Cityscapes, PASCAL VOC 2012 and COCO Stuff.
arXiv Detail & Related papers (2020-07-19T15:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.