Open-vocabulary Panoptic Segmentation with Embedding Modulation
- URL: http://arxiv.org/abs/2303.11324v2
- Date: Sat, 15 Jul 2023 11:04:26 GMT
- Title: Open-vocabulary Panoptic Segmentation with Embedding Modulation
- Authors: Xi Chen, Shuang Li, Ser-Nam Lim, Antonio Torralba, Hengshuang Zhao
- Abstract summary: Open-vocabulary image segmentation is attracting increasing attention due to its critical applications in the real world.
Traditional closed-vocabulary segmentation methods are not able to characterize novel objects, whereas several recent open-vocabulary attempts obtain unsatisfactory results.
We propose OPSNet, an omnipotent and data-efficient framework for Open-vocabulary Panopticon.
- Score: 71.15502078615587
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-vocabulary image segmentation is attracting increasing attention due to
its critical applications in the real world. Traditional closed-vocabulary
segmentation methods are not able to characterize novel objects, whereas
several recent open-vocabulary attempts obtain unsatisfactory results, i.e.,
notable performance reduction on the closed vocabulary and massive demand for
extra data. To this end, we propose OPSNet, an omnipotent and data-efficient
framework for Open-vocabulary Panoptic Segmentation. Specifically, the
exquisitely designed Embedding Modulation module, together with several
meticulous components, enables adequate embedding enhancement and information
exchange between the segmentation model and the visual-linguistic well-aligned
CLIP encoder, resulting in superior segmentation performance under both open-
and closed-vocabulary settings with much fewer need of additional data.
Extensive experimental evaluations are conducted across multiple datasets
(e.g., COCO, ADE20K, Cityscapes, and PascalContext) under various
circumstances, where the proposed OPSNet achieves state-of-the-art results,
which demonstrates the effectiveness and generality of the proposed approach.
The code and trained models will be made publicly available.
Related papers
- kNN-CLIP: Retrieval Enables Training-Free Segmentation on Continually Expanding Large Vocabularies [22.51592283786031]
kNN-CLIP is a training-free strategy for continual segmentation.
It can adapt to continually growing vocabularies without the need for retraining or large memory costs.
It achieves state-of-the-art performance across large-vocabulary semantic and panoptic segmentation datasets.
arXiv Detail & Related papers (2024-04-15T04:20:01Z) - Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS)
We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes.
By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z) - Diffusion Model is Secretly a Training-free Open Vocabulary Semantic
Segmenter [47.29967666846132]
generative text-to-image diffusion models are highly efficient open-vocabulary semantic segmenters.
We introduce a novel training-free approach named DiffSegmenter to generate realistic objects that are semantically faithful to the input text.
Extensive experiments on three benchmark datasets show that the proposed DiffSegmenter achieves impressive results for open-vocabulary semantic segmentation.
arXiv Detail & Related papers (2023-09-06T06:31:08Z) - RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation.
Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal.
We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z) - Diffusion Models for Open-Vocabulary Segmentation [79.02153797465324]
OVDiff is a novel method that leverages generative text-to-image diffusion models for unsupervised open-vocabulary segmentation.
It relies solely on pre-trained components and outputs the synthesised segmenter directly, without training.
arXiv Detail & Related papers (2023-06-15T17:51:28Z) - Betrayed by Captions: Joint Caption Grounding and Generation for Open
Vocabulary Instance Segmentation [80.48979302400868]
We focus on open vocabulary instance segmentation to expand a segmentation model to classify and segment instance-level novel categories.
Previous approaches have relied on massive caption datasets and complex pipelines to establish one-to-one mappings between image regions and captions in nouns.
We devise a joint textbfCaption Grounding and Generation (CGG) framework, which incorporates a novel grounding loss that only focuses on matching object to improve learning efficiency.
arXiv Detail & Related papers (2023-01-02T18:52:12Z) - SlimSeg: Slimmable Semantic Segmentation with Boundary Supervision [54.16430358203348]
We propose a simple but effective slimmable semantic segmentation (SlimSeg) method, which can be executed at different capacities during inference.
We show that our proposed SlimSeg with various mainstream networks can produce flexible models that provide dynamic adjustment of computational cost and better performance.
arXiv Detail & Related papers (2022-07-13T14:41:05Z) - Generalizing Interactive Backpropagating Refinement for Dense Prediction [0.0]
We introduce a set of G-BRS layers that enable both global and localized refinement for a range of dense prediction tasks.
Our method can successfully generalize and significantly improve performance of existing pretrained state-of-the-art models with only a few clicks.
arXiv Detail & Related papers (2021-12-21T03:52:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.