SegPrompt: Boosting Open-world Segmentation via Category-level Prompt
Learning
- URL: http://arxiv.org/abs/2308.06531v1
- Date: Sat, 12 Aug 2023 11:25:39 GMT
- Title: SegPrompt: Boosting Open-world Segmentation via Category-level Prompt
Learning
- Authors: Muzhi Zhu, Hengtao Li, Hao Chen, Chengxiang Fan, Weian Mao, Chenchen
Jing, Yifan Liu, Chunhua Shen
- Abstract summary: Open-world instance segmentation (OWIS) models detect unknown objects in a class-agnostic manner.
Previous OWIS approaches completely erase category information during training to keep the model's ability to generalize to unknown objects.
We propose a novel training mechanism termed SegPrompt that uses category information to improve the model's class-agnostic segmentation ability.
- Score: 49.17344010035996
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Current closed-set instance segmentation models rely on pre-defined class
labels for each mask during training and evaluation, largely limiting their
ability to detect novel objects. Open-world instance segmentation (OWIS) models
address this challenge by detecting unknown objects in a class-agnostic manner.
However, previous OWIS approaches completely erase category information during
training to keep the model's ability to generalize to unknown objects. In this
work, we propose a novel training mechanism termed SegPrompt that uses category
information to improve the model's class-agnostic segmentation ability for both
known and unknown categories. In addition, the previous OWIS training setting
exposes the unknown classes to the training set and brings information leakage,
which is unreasonable in the real world. Therefore, we provide a new open-world
benchmark closer to a real-world scenario by dividing the dataset classes into
known-seen-unseen parts. For the first time, we focus on the model's ability to
discover objects that never appear in the training set images.
Experiments show that SegPrompt can improve the overall and unseen detection
performance by 5.6% and 6.1% in AR on our new benchmark without affecting the
inference efficiency. We further demonstrate the effectiveness of our method on
existing cross-dataset transfer and strongly supervised settings, leading to
5.5% and 12.3% relative improvement.
Related papers
- Enhancing Visual Continual Learning with Language-Guided Supervision [76.38481740848434]
Continual learning aims to empower models to learn new tasks without forgetting previously acquired knowledge.
We argue that the scarce semantic information conveyed by the one-hot labels hampers the effective knowledge transfer across tasks.
Specifically, we use PLMs to generate semantic targets for each class, which are frozen and serve as supervision signals.
arXiv Detail & Related papers (2024-03-24T12:41:58Z) - Open-World Semantic Segmentation Including Class Similarity [31.799000996671975]
This paper tackles open-world semantic segmentation, i.e., the variant of interpreting image data in which objects occur that have not been seen during training.
We propose a novel approach that performs accurate closed-world semantic segmentation and can identify new categories without requiring any additional training data.
arXiv Detail & Related papers (2024-03-12T11:11:19Z) - Debiased Novel Category Discovering and Localization [40.02326438622898]
We focus on the challenging problem of Novel Class Discovery and Localization (NCDL)
We propose an Debiased Region Mining (DRM) approach that combines class-agnostic Region Proposal Network (RPN) and class-aware RPN.
We conduct extensive experiments on the NCDL benchmark, and the results demonstrate that the proposed DRM approach significantly outperforms previous methods.
arXiv Detail & Related papers (2024-02-29T03:09:16Z) - Incremental Object Detection with CLIP [36.478530086163744]
We propose a visual-language model such as CLIP to generate text feature embeddings for different class sets.
We then employ super-classes to replace the unavailable novel classes in the early learning stage to simulate the incremental scenario.
We incorporate the finely recognized detection boxes as pseudo-annotations into the training process, thereby further improving the detection performance.
arXiv Detail & Related papers (2023-10-13T01:59:39Z) - Activate and Reject: Towards Safe Domain Generalization under Category
Shift [71.95548187205736]
We study a practical problem of Domain Generalization under Category Shift (DGCS)
It aims to simultaneously detect unknown-class samples and classify known-class samples in the target domains.
Compared to prior DG works, we face two new challenges: 1) how to learn the concept of unknown'' during training with only source known-class samples, and 2) how to adapt the source-trained model to unseen environments.
arXiv Detail & Related papers (2023-10-07T07:53:12Z) - ElC-OIS: Ellipsoidal Clustering for Open-World Instance Segmentation on
LiDAR Data [13.978966783993146]
Open-world Instance (OIS) is a challenging task that aims to accurately segment every object instance appearing in the current observation.
This is important for safety-critical applications such as robust autonomous navigation.
We present a flexible and effective OIS framework for LiDAR point cloud that can accurately segment both known and unknown instances.
arXiv Detail & Related papers (2023-03-08T03:22:11Z) - Open World DETR: Transformer based Open World Object Detection [60.64535309016623]
We propose a two-stage training approach named Open World DETR for open world object detection based on Deformable DETR.
We fine-tune the class-specific components of the model with a multi-view self-labeling strategy and a consistency constraint.
Our proposed method outperforms other state-of-the-art open world object detection methods by a large margin.
arXiv Detail & Related papers (2022-12-06T13:39:30Z) - Novel Class Discovery in Semantic Segmentation [104.30729847367104]
We introduce a new setting of Novel Class Discovery in Semantic (NCDSS)
It aims at segmenting unlabeled images containing new classes given prior knowledge from a labeled set of disjoint classes.
In NCDSS, we need to distinguish the objects and background, and to handle the existence of multiple classes within an image.
We propose the Entropy-based Uncertainty Modeling and Self-training (EUMS) framework to overcome noisy pseudo-labels.
arXiv Detail & Related papers (2021-12-03T13:31:59Z) - CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action
Recognition [52.66360172784038]
We propose a clustering-based model, which considers all training samples at once, instead of optimizing for each instance individually.
We call the proposed method CLASTER and observe that it consistently improves over the state-of-the-art in all standard datasets.
arXiv Detail & Related papers (2021-01-18T12:46:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.