Understanding Multi-Granularity for Open-Vocabulary Part Segmentation
- URL: http://arxiv.org/abs/2406.11384v2
- Date: Sat, 02 Nov 2024 11:22:40 GMT
- Title: Understanding Multi-Granularity for Open-Vocabulary Part Segmentation
- Authors: Jiho Choi, Seonho Lee, Seungho Lee, Minhyun Lee, Hyunjung Shim,
- Abstract summary: Open-vocabulary part segmentation (OVPS) is an emerging research area focused on segmenting fine-grained entities using diverse and previously unseen vocabularies.
Our study highlights the inherent complexities of part segmentation due to intricate boundaries and diverse granularity, reflecting the knowledge-based nature of part identification.
We propose PartCLIPSeg, a novel framework utilizing generalized parts and object-level contexts to mitigate the lack of generalization in fine-grained parts.
- Score: 24.071471822239854
- License:
- Abstract: Open-vocabulary part segmentation (OVPS) is an emerging research area focused on segmenting fine-grained entities using diverse and previously unseen vocabularies. Our study highlights the inherent complexities of part segmentation due to intricate boundaries and diverse granularity, reflecting the knowledge-based nature of part identification. To address these challenges, we propose PartCLIPSeg, a novel framework utilizing generalized parts and object-level contexts to mitigate the lack of generalization in fine-grained parts. PartCLIPSeg integrates competitive part relationships and attention control, alleviating ambiguous boundaries and underrepresented parts. Experimental results demonstrate that PartCLIPSeg outperforms existing state-of-the-art OVPS methods, offering refined segmentation and an advanced understanding of part relationships within images. Through extensive experiments, our model demonstrated a significant improvement over the state-of-the-art models on the Pascal-Part-116, ADE20K-Part-234, and PartImageNet datasets.
Related papers
- A Bottom-Up Approach to Class-Agnostic Image Segmentation [4.086366531569003]
We present a novel bottom-up formulation for addressing the class-agnostic segmentation problem.
We supervise our network directly on the projective sphere of its feature space.
Our bottom-up formulation exhibits exceptional generalization capability, even when trained on datasets designed for class-based segmentation.
arXiv Detail & Related papers (2024-09-20T17:56:02Z) - Image Segmentation in Foundation Model Era: A Survey [99.19456390358211]
Current research in image segmentation lacks a detailed analysis of distinct characteristics, challenges, and solutions associated with these advancements.
This survey seeks to fill this gap by providing a thorough review of cutting-edge research centered around FM-driven image segmentation.
An exhaustive overview of over 300 segmentation approaches is provided to encapsulate the breadth of current research efforts.
arXiv Detail & Related papers (2024-08-23T10:07:59Z) - Visual Prompt Selection for In-Context Learning Segmentation [77.15684360470152]
In this paper, we focus on rethinking and improving the example selection strategy.
We first demonstrate that ICL-based segmentation models are sensitive to different contexts.
Furthermore, empirical evidence indicates that the diversity of contextual prompts plays a crucial role in guiding segmentation.
arXiv Detail & Related papers (2024-07-14T15:02:54Z) - OV-PARTS: Towards Open-Vocabulary Part Segmentation [31.136262413989858]
Segmenting and recognizing diverse object parts is a crucial ability in applications spanning various computer vision and robotic tasks.
We propose an Open-Vocabulary Part (OV-PARTS) benchmark to investigate and tackle these challenges.
OV-PARTS includes refined versions of two publicly available datasets: Pascal-Part-116 and ADE20K--234. And it covers three specific tasks: Generalized Zero-Shot Part analog, Cross-Dataset Part, and Few-Shot Part.
arXiv Detail & Related papers (2023-10-08T10:28:42Z) - Mitigating the Effect of Incidental Correlations on Part-based Learning [50.682498099720114]
Part-based representations could be more interpretable and generalize better with limited data.
We present two innovative regularization methods for part-based representations.
We exhibit state-of-the-art (SoTA) performance on few-shot learning tasks on benchmark datasets.
arXiv Detail & Related papers (2023-09-30T13:44:48Z) - PartSeg: Few-shot Part Segmentation via Part-aware Prompt Learning [44.48704588318053]
We develop a novel method termed PartSeg for few-shot part segmentation based on multimodal learning.
We conduct extensive experiments on the PartImageNet and Pascal$_$Part datasets.
arXiv Detail & Related papers (2023-08-24T13:03:42Z) - Open-vocabulary Panoptic Segmentation with Embedding Modulation [71.15502078615587]
Open-vocabulary image segmentation is attracting increasing attention due to its critical applications in the real world.
Traditional closed-vocabulary segmentation methods are not able to characterize novel objects, whereas several recent open-vocabulary attempts obtain unsatisfactory results.
We propose OPSNet, an omnipotent and data-efficient framework for Open-vocabulary Panopticon.
arXiv Detail & Related papers (2023-03-20T17:58:48Z) - Betrayed by Captions: Joint Caption Grounding and Generation for Open
Vocabulary Instance Segmentation [80.48979302400868]
We focus on open vocabulary instance segmentation to expand a segmentation model to classify and segment instance-level novel categories.
Previous approaches have relied on massive caption datasets and complex pipelines to establish one-to-one mappings between image regions and captions in nouns.
We devise a joint textbfCaption Grounding and Generation (CGG) framework, which incorporates a novel grounding loss that only focuses on matching object to improve learning efficiency.
arXiv Detail & Related papers (2023-01-02T18:52:12Z) - Unsupervised Co-part Segmentation through Assembly [42.874278526843305]
We propose an unsupervised learning approach for co-part segmentation from images.
We leverage motion information embedded in videos and explicitly extract latent representations to segment meaningful object parts.
We show that our approach can achieve meaningful and compact part segmentation, outperforming state-of-the-art approaches on diverse benchmarks.
arXiv Detail & Related papers (2021-06-10T16:22:53Z) - Referring Image Segmentation via Cross-Modal Progressive Comprehension [94.70482302324704]
Referring image segmentation aims at segmenting the foreground masks of the entities that can well match the description given in the natural language expression.
Previous approaches tackle this problem using implicit feature interaction and fusion between visual and linguistic modalities.
We propose a Cross-Modal Progressive (CMPC) module and a Text-Guided Feature Exchange (TGFE) module to effectively address the challenging task.
arXiv Detail & Related papers (2020-10-01T16:02:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.