Open-vocabulary Panoptic Segmentation with Embedding Modulation
- URL: http://arxiv.org/abs/2303.11324v2
- Date: Sat, 15 Jul 2023 11:04:26 GMT
- Title: Open-vocabulary Panoptic Segmentation with Embedding Modulation
- Authors: Xi Chen, Shuang Li, Ser-Nam Lim, Antonio Torralba, Hengshuang Zhao
- Abstract summary: Open-vocabulary image segmentation is attracting increasing attention due to its critical applications in the real world.
Traditional closed-vocabulary segmentation methods are not able to characterize novel objects, whereas several recent open-vocabulary attempts obtain unsatisfactory results.
We propose OPSNet, an omnipotent and data-efficient framework for Open-vocabulary Panopticon.
- Score: 71.15502078615587
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-vocabulary image segmentation is attracting increasing attention due to
its critical applications in the real world. Traditional closed-vocabulary
segmentation methods are not able to characterize novel objects, whereas
several recent open-vocabulary attempts obtain unsatisfactory results, i.e.,
notable performance reduction on the closed vocabulary and massive demand for
extra data. To this end, we propose OPSNet, an omnipotent and data-efficient
framework for Open-vocabulary Panoptic Segmentation. Specifically, the
exquisitely designed Embedding Modulation module, together with several
meticulous components, enables adequate embedding enhancement and information
exchange between the segmentation model and the visual-linguistic well-aligned
CLIP encoder, resulting in superior segmentation performance under both open-
and closed-vocabulary settings with much fewer need of additional data.
Extensive experimental evaluations are conducted across multiple datasets
(e.g., COCO, ADE20K, Cityscapes, and PascalContext) under various
circumstances, where the proposed OPSNet achieves state-of-the-art results,
which demonstrates the effectiveness and generality of the proposed approach.
The code and trained models will be made publicly available.
Related papers
- Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS)
We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes.
By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z) - Diffusion Model is Secretly a Training-free Open Vocabulary Semantic
Segmenter [47.29967666846132]
generative text-to-image diffusion models are highly efficient open-vocabulary semantic segmenters.
We introduce a novel training-free approach named DiffSegmenter to generate realistic objects that are semantically faithful to the input text.
Extensive experiments on three benchmark datasets show that the proposed DiffSegmenter achieves impressive results for open-vocabulary semantic segmentation.
arXiv Detail & Related papers (2023-09-06T06:31:08Z) - Global Knowledge Calibration for Fast Open-Vocabulary Segmentation [124.74256749281625]
We introduce a text diversification strategy that generates a set of synonyms for each training category.
We also employ a text-guided knowledge distillation method to preserve the generalizable knowledge of CLIP.
Our proposed model achieves robust generalization performance across various datasets.
arXiv Detail & Related papers (2023-03-16T09:51:41Z) - Betrayed by Captions: Joint Caption Grounding and Generation for Open
Vocabulary Instance Segmentation [80.48979302400868]
We focus on open vocabulary instance segmentation to expand a segmentation model to classify and segment instance-level novel categories.
Previous approaches have relied on massive caption datasets and complex pipelines to establish one-to-one mappings between image regions and captions in nouns.
We devise a joint textbfCaption Grounding and Generation (CGG) framework, which incorporates a novel grounding loss that only focuses on matching object to improve learning efficiency.
arXiv Detail & Related papers (2023-01-02T18:52:12Z) - An Efficient Multi-Scale Fusion Network for 3D Organ at Risk (OAR)
Segmentation [2.6770199357488242]
We propose a new OAR segmentation framework called OARFocalFuseNet.
It fuses multi-scale features and employs focal modulation for capturing global-local context across multiple scales.
Our best performing method (OARFocalFuseNet) obtained a dice coefficient of 0.7995 and hausdorff distance of 5.1435 on OpenKBP datasets.
arXiv Detail & Related papers (2022-08-15T19:40:18Z) - SlimSeg: Slimmable Semantic Segmentation with Boundary Supervision [54.16430358203348]
We propose a simple but effective slimmable semantic segmentation (SlimSeg) method, which can be executed at different capacities during inference.
We show that our proposed SlimSeg with various mainstream networks can produce flexible models that provide dynamic adjustment of computational cost and better performance.
arXiv Detail & Related papers (2022-07-13T14:41:05Z) - Generalizing Interactive Backpropagating Refinement for Dense Prediction [0.0]
We introduce a set of G-BRS layers that enable both global and localized refinement for a range of dense prediction tasks.
Our method can successfully generalize and significantly improve performance of existing pretrained state-of-the-art models with only a few clicks.
arXiv Detail & Related papers (2021-12-21T03:52:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.