Exploring Transformers for Open-world Instance Segmentation
- URL: http://arxiv.org/abs/2308.04206v1
- Date: Tue, 8 Aug 2023 12:12:30 GMT
- Title: Exploring Transformers for Open-world Instance Segmentation
- Authors: Jiannan Wu, Yi Jiang, Bin Yan, Huchuan Lu, Zehuan Yuan, Ping Luo
- Abstract summary: We utilize the Transformer for open-world instance segmentation and present SWORD.
We propose a novel contrastive learning framework to enlarge the representations between objects and background.
Our models achieve state-of-the-art performance in various open-world cross-category and cross-dataset generalizations.
- Score: 87.21723085867
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-world instance segmentation is a rising task, which aims to segment all
objects in the image by learning from a limited number of base-category
objects. This task is challenging, as the number of unseen categories could be
hundreds of times larger than that of seen categories. Recently, the DETR-like
models have been extensively studied in the closed world while stay unexplored
in the open world. In this paper, we utilize the Transformer for open-world
instance segmentation and present SWORD. Firstly, we introduce to attach the
stop-gradient operation before classification head and further add IoU heads
for discovering novel objects. We demonstrate that a simple stop-gradient
operation not only prevents the novel objects from being suppressed as
background, but also allows the network to enjoy the merit of heuristic label
assignment. Secondly, we propose a novel contrastive learning framework to
enlarge the representations between objects and background. Specifically, we
maintain a universal object queue to obtain the object center, and dynamically
select positive and negative samples from the object queries for contrastive
learning. While the previous works only focus on pursuing average recall and
neglect average precision, we show the prominence of SWORD by giving
consideration to both criteria. Our models achieve state-of-the-art performance
in various open-world cross-category and cross-dataset generalizations.
Particularly, in VOC to non-VOC setup, our method sets new state-of-the-art
results of 40.0% on ARb100 and 34.9% on ARm100. For COCO to UVO generalization,
SWORD significantly outperforms the previous best open-world model by 5.9% on
APm and 8.1% on ARm100.
Related papers
- SegPrompt: Boosting Open-world Segmentation via Category-level Prompt
Learning [49.17344010035996]
Open-world instance segmentation (OWIS) models detect unknown objects in a class-agnostic manner.
Previous OWIS approaches completely erase category information during training to keep the model's ability to generalize to unknown objects.
We propose a novel training mechanism termed SegPrompt that uses category information to improve the model's class-agnostic segmentation ability.
arXiv Detail & Related papers (2023-08-12T11:25:39Z) - Semantic-Promoted Debiasing and Background Disambiguation for Zero-Shot
Instance Segmentation [13.001629605405954]
Zero-shot instance segmentation aims to detect and precisely segment objects of unseen categories without any training samples.
We propose D$2$Zero with Semantic-Promoted Debiasing and Background Disambiguation.
Background disambiguation produces image-adaptive background representation to avoid mistaking novel objects for background.
arXiv Detail & Related papers (2023-05-22T16:00:01Z) - GOOD: Exploring Geometric Cues for Detecting Objects in an Open World [33.25263418112558]
State-of-the-art RGB-based models suffer from overfitting the training classes and often fail at detecting novel-looking objects.
We propose incorporating geometric cues such as depth and normals, predicted by general-purpose monocular estimators.
Our resulting Geometry-guided Open-world Object Detector (GOOD) significantly improves detection recall for novel object categories and already performs well with only a few training classes.
arXiv Detail & Related papers (2022-12-22T14:13:33Z) - Open World DETR: Transformer based Open World Object Detection [60.64535309016623]
We propose a two-stage training approach named Open World DETR for open world object detection based on Deformable DETR.
We fine-tune the class-specific components of the model with a multi-view self-labeling strategy and a consistency constraint.
Our proposed method outperforms other state-of-the-art open world object detection methods by a large margin.
arXiv Detail & Related papers (2022-12-06T13:39:30Z) - CLIP on Wheels: Zero-Shot Object Navigation as Object Localization and
Exploration [31.18818639097139]
In this paper, we translate the success of zero-shot vision models to the popular embodied AI task of object navigation.
We design CLIP on Wheels (CoW) baselines for the task and evaluate each zero-shot model in both Habitat and RoboTHOR simulators.
We find that a straightforward CoW, with CLIP-based object localization plus classical exploration, and no additional training, often outperforms learnable approaches in terms of success, efficiency, and robustness to dataset distribution shift.
arXiv Detail & Related papers (2022-03-20T00:52:45Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - Learning to Detect Every Thing in an Open World [139.78830329914135]
We propose a simple yet surprisingly powerful data augmentation and training scheme we call Learning to Detect Every Thing (LDET)
To avoid suppressing hidden objects, background objects that are visible but unlabeled, we paste annotated objects on a background image sampled from a small region of the original image.
LDET leads to significant improvements on many datasets in the open world instance segmentation task.
arXiv Detail & Related papers (2021-12-03T03:56:06Z) - Learning Open-World Object Proposals without Learning to Classify [110.30191531975804]
We propose a classification-free Object Localization Network (OLN) which estimates the objectness of each region purely by how well the location and shape of a region overlaps with any ground-truth object.
This simple strategy learns generalizable objectness and outperforms existing proposals on cross-category generalization.
arXiv Detail & Related papers (2021-08-15T14:36:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.