OV-PARTS: Towards Open-Vocabulary Part Segmentation
- URL: http://arxiv.org/abs/2310.05107v1
- Date: Sun, 8 Oct 2023 10:28:42 GMT
- Title: OV-PARTS: Towards Open-Vocabulary Part Segmentation
- Authors: Meng Wei, Xiaoyu Yue, Wenwei Zhang, Shu Kong, Xihui Liu, Jiangmiao
Pang
- Abstract summary: Segmenting and recognizing diverse object parts is a crucial ability in applications spanning various computer vision and robotic tasks.
We propose an Open-Vocabulary Part (OV-PARTS) benchmark to investigate and tackle these challenges.
OV-PARTS includes refined versions of two publicly available datasets: Pascal-Part-116 and ADE20K--234. And it covers three specific tasks: Generalized Zero-Shot Part analog, Cross-Dataset Part, and Few-Shot Part.
- Score: 31.136262413989858
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Segmenting and recognizing diverse object parts is a crucial ability in
applications spanning various computer vision and robotic tasks. While
significant progress has been made in object-level Open-Vocabulary Semantic
Segmentation (OVSS), i.e., segmenting objects with arbitrary text, the
corresponding part-level research poses additional challenges. Firstly, part
segmentation inherently involves intricate boundaries, while limited annotated
data compounds the challenge. Secondly, part segmentation introduces an open
granularity challenge due to the diverse and often ambiguous definitions of
parts in the open world. Furthermore, the large-scale vision and language
models, which play a key role in the open vocabulary setting, struggle to
recognize parts as effectively as objects. To comprehensively investigate and
tackle these challenges, we propose an Open-Vocabulary Part Segmentation
(OV-PARTS) benchmark. OV-PARTS includes refined versions of two publicly
available datasets: Pascal-Part-116 and ADE20K-Part-234. And it covers three
specific tasks: Generalized Zero-Shot Part Segmentation, Cross-Dataset Part
Segmentation, and Few-Shot Part Segmentation, providing insights into
analogical reasoning, open granularity and few-shot adapting abilities of
models. Moreover, we analyze and adapt two prevailing paradigms of existing
object-level OVSS methods for OV-PARTS. Extensive experimental analysis is
conducted to inspire future research in leveraging foundational models for
OV-PARTS. The code and dataset are available at
https://github.com/OpenRobotLab/OV_PARTS.
Related papers
- Image Segmentation in Foundation Model Era: A Survey [99.19456390358211]
Current research in image segmentation lacks a detailed analysis of distinct characteristics, challenges, and solutions associated with these advancements.
This survey seeks to fill this gap by providing a thorough review of cutting-edge research centered around FM-driven image segmentation.
An exhaustive overview of over 300 segmentation approaches is provided to encapsulate the breadth of current research efforts.
arXiv Detail & Related papers (2024-08-23T10:07:59Z) - VISA: Reasoning Video Object Segmentation via Large Language Models [64.33167989521357]
We introduce a new task, Reasoning Video Object (ReasonVOS)
This task aims to generate a sequence of segmentation masks in response to implicit text queries that require complex reasoning abilities.
We introduce VISA (Video-based large language Instructed Assistant) to tackle ReasonVOS.
arXiv Detail & Related papers (2024-07-16T02:29:29Z) - Understanding Multi-Granularity for Open-Vocabulary Part Segmentation [24.071471822239854]
Open-vocabulary part segmentation (OVPS) is an emerging research area focused on segmenting fine-grained entities using diverse and previously unseen vocabularies.
Our study highlights the inherent complexities of part segmentation due to intricate boundaries and diverse granularity, reflecting the knowledge-based nature of part identification.
We propose PartCLIPSeg, a novel framework utilizing generalized parts and object-level contexts to mitigate the lack of generalization in fine-grained parts.
arXiv Detail & Related papers (2024-06-17T10:11:28Z) - USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation [33.11010205890195]
The main challenge in open-vocabulary image segmentation now lies in accurately classifying these segments into text-defined categories.
We introduce the Universal Segment Embedding (USE) framework to address this challenge.
This framework is comprised of two key components: 1) a data pipeline designed to efficiently curate a large amount of segment-text pairs at various granularities, and 2) a universal segment embedding model that enables precise segment classification into a vast range of text-defined categories.
arXiv Detail & Related papers (2024-06-07T21:41:18Z) - Frequency-based Matcher for Long-tailed Semantic Segmentation [22.199174076366003]
We focus on a relatively under-explored task setting, long-tailed semantic segmentation (LTSS)
We propose a dual-metric evaluation system and construct the LTSS benchmark to demonstrate the performance of semantic segmentation methods and long-tailed solutions.
We also propose a transformer-based algorithm to improve LTSS, frequency-based matcher, which solves the oversuppression problem by one-to-many matching.
arXiv Detail & Related papers (2024-06-06T09:57:56Z) - LISA: Reasoning Segmentation via Large Language Model [68.24075852136761]
We propose a new segmentation task -- reasoning segmentation.
The task is designed to output a segmentation mask given a complex and implicit query text.
We present LISA: large Language Instructed Assistant, which inherits the language generation capabilities of multimodal Large Language Models.
arXiv Detail & Related papers (2023-08-01T17:50:17Z) - AIMS: All-Inclusive Multi-Level Segmentation [93.5041381700744]
We propose a new task, All-Inclusive Multi-Level (AIMS), which segments visual regions into three levels: part, entity, and relation.
We also build a unified AIMS model through multi-dataset multi-task training to address the two major challenges of annotation inconsistency and task correlation.
arXiv Detail & Related papers (2023-05-28T16:28:49Z) - Towards Open-World Segmentation of Parts [16.056921233445784]
We propose to explore a class-agnostic part segmentation task.
We argue that models trained without part classes can better localize parts and segment them on objects unseen in training.
We show notable and consistent gains by our approach, essentially a critical step towards open-world part segmentation.
arXiv Detail & Related papers (2023-05-26T10:34:58Z) - Going Denser with Open-Vocabulary Part Segmentation [38.395986723880505]
We propose a detector with the ability to predict both open-vocabulary objects and their part segmentation.
This ability comes from two designs. First, we train the detector on the joint of part-level, object-level and image-level data to build the multi-granularity alignment between language and image.
Second, we parse the novel object into its parts by its dense semantic correspondence with the base object.
arXiv Detail & Related papers (2023-05-18T17:59:10Z) - Open-world Semantic Segmentation via Contrasting and Clustering
Vision-Language Embedding [95.78002228538841]
We propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations.
Our method can directly segment objects of arbitrary categories, outperforming zero-shot segmentation methods that require data labeling on three benchmark datasets.
arXiv Detail & Related papers (2022-07-18T09:20:04Z) - A Survey on Deep Learning Technique for Video Segmentation [147.0767454918527]
Video segmentation plays a critical role in a broad range of practical applications.
Deep learning based approaches have been dedicated to video segmentation and delivered compelling performance.
arXiv Detail & Related papers (2021-07-02T15:51:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.