Going Denser with Open-Vocabulary Part Segmentation
- URL: http://arxiv.org/abs/2305.11173v1
- Date: Thu, 18 May 2023 17:59:10 GMT
- Title: Going Denser with Open-Vocabulary Part Segmentation
- Authors: Peize Sun, Shoufa Chen, Chenchen Zhu, Fanyi Xiao, Ping Luo, Saining
Xie, Zhicheng Yan
- Abstract summary: We propose a detector with the ability to predict both open-vocabulary objects and their part segmentation.
This ability comes from two designs. First, we train the detector on the joint of part-level, object-level and image-level data to build the multi-granularity alignment between language and image.
Second, we parse the novel object into its parts by its dense semantic correspondence with the base object.
- Score: 38.395986723880505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object detection has been expanded from a limited number of categories to
open vocabulary. Moving forward, a complete intelligent vision system requires
understanding more fine-grained object descriptions, object parts. In this
paper, we propose a detector with the ability to predict both open-vocabulary
objects and their part segmentation. This ability comes from two designs.
First, we train the detector on the joint of part-level, object-level and
image-level data to build the multi-granularity alignment between language and
image. Second, we parse the novel object into its parts by its dense semantic
correspondence with the base object. These two designs enable the detector to
largely benefit from various data sources and foundation models. In
open-vocabulary part segmentation experiments, our method outperforms the
baseline by 3.3$\sim$7.3 mAP in cross-dataset generalization on PartImageNet,
and improves the baseline by 7.3 novel AP$_{50}$ in cross-category
generalization on Pascal Part. Finally, we train a detector that generalizes to
a wide range of part segmentation datasets while achieving better performance
than dataset-specific training.
Related papers
- DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection [111.68263493302499]
We introduce DetCLIPv3, a high-performing detector that excels at both open-vocabulary object detection and hierarchical labels.
DetCLIPv3 is characterized by three core designs: 1) Versatile model architecture; 2) High information density data; and 3) Efficient training strategy.
DetCLIPv3 demonstrates superior open-vocabulary detection performance, outperforming GLIPv2, GroundingDINO, and DetCLIPv2 by 18.0/19.6/6.6 AP, respectively.
arXiv Detail & Related papers (2024-04-14T11:01:44Z) - PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model [19.333506797686695]
We introduce a novel segmentation task known as reasoning part segmentation for 3D objects.
We output a segmentation mask based on complex and implicit textual queries about specific parts of a 3D object.
We propose a model that is capable of segmenting parts of 3D objects based on implicit textual queries and generating natural language explanations.
arXiv Detail & Related papers (2024-04-04T23:38:45Z) - Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS)
We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes.
By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z) - OV-PARTS: Towards Open-Vocabulary Part Segmentation [31.136262413989858]
Segmenting and recognizing diverse object parts is a crucial ability in applications spanning various computer vision and robotic tasks.
We propose an Open-Vocabulary Part (OV-PARTS) benchmark to investigate and tackle these challenges.
OV-PARTS includes refined versions of two publicly available datasets: Pascal-Part-116 and ADE20K--234. And it covers three specific tasks: Generalized Zero-Shot Part analog, Cross-Dataset Part, and Few-Shot Part.
arXiv Detail & Related papers (2023-10-08T10:28:42Z) - Open-world Semantic Segmentation via Contrasting and Clustering
Vision-Language Embedding [95.78002228538841]
We propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations.
Our method can directly segment objects of arbitrary categories, outperforming zero-shot segmentation methods that require data labeling on three benchmark datasets.
arXiv Detail & Related papers (2022-07-18T09:20:04Z) - PartAfford: Part-level Affordance Discovery from 3D Objects [113.91774531972855]
We present a new task of part-level affordance discovery (PartAfford)
Given only the affordance labels per object, the machine is tasked to (i) decompose 3D shapes into parts and (ii) discover how each part corresponds to a certain affordance category.
We propose a novel learning framework for PartAfford, which discovers part-level representations by leveraging only the affordance set supervision and geometric primitive regularization.
arXiv Detail & Related papers (2022-02-28T02:58:36Z) - Unsupervised Part Discovery from Contrastive Reconstruction [90.88501867321573]
The goal of self-supervised visual representation learning is to learn strong, transferable image representations.
We propose an unsupervised approach to object part discovery and segmentation.
Our method yields semantic parts consistent across fine-grained but visually distinct categories.
arXiv Detail & Related papers (2021-11-11T17:59:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.