Related papers: FLOAT: Factorized Learning of Object Attributes for Improved Multi-object Multi-part Scene Parsing

FLOAT: Factorized Learning of Object Attributes for Improved Multi-object Multi-part Scene Parsing

URL: http://arxiv.org/abs/2203.16168v1
Date: Wed, 30 Mar 2022 09:46:10 GMT
Title: FLOAT: Factorized Learning of Object Attributes for Improved Multi-object Multi-part Scene Parsing
Authors: Rishubh Singh, Pranav Gupta, Pradeep Shenoy and Ravikiran Sarvadevabhatla
Abstract summary: We propose FLOAT, a factorized label space framework for scalable multi-object multi-part parsing. Our framework involves independent dense prediction of object category and part attributes. In addition, we propose an inference-time 'zoom' refinement technique which significantly improves segmentation quality.
Score: 10.94244766491706
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-object multi-part scene parsing is a challenging task which requires detecting multiple object classes in a scene and segmenting the semantic parts within each object. In this paper, we propose FLOAT, a factorized label space framework for scalable multi-object multi-part parsing. Our framework involves independent dense prediction of object category and part attributes which increases scalability and reduces task complexity compared to the monolithic label space counterpart. In addition, we propose an inference-time 'zoom' refinement technique which significantly improves segmentation quality, especially for smaller objects/parts. Compared to state of the art, FLOAT obtains an absolute improvement of 2.0% for mean IOU (mIOU) and 4.8% for segmentation quality IOU (sqIOU) on the Pascal-Part-58 dataset. For the larger Pascal-Part-108 dataset, the improvements are 2.1% for mIOU and 3.9% for sqIOU. We incorporate previously excluded part attributes and other minor parts of the Pascal-Part dataset to create the most comprehensive and challenging version which we dub Pascal-Part-201. FLOAT obtains improvements of 8.6% for mIOU and 7.5% for sqIOU on the new dataset, demonstrating its parsing effectiveness across a challenging diversity of objects and parts. The code and datasets are available at floatseg.github.io.

Related papers

Generalizable Articulated Object Perception with Superpoints [42.52926364769424]
We introduce a novel superpoint-based perception method designed to improve part segmentation in 3D point clouds of articulated objects. We propose a learnable, part-aware superpoint generation technique that efficiently groups points based on their geometric and semantic similarities. Experimental results on the GAPartNet dataset show that our method outperforms existing state-of-the-art approaches in cross-category part segmentation.
arXiv Detail & Related papers (2024-12-21T14:57:24Z)
From Pixels to Objects: A Hierarchical Approach for Part and Object Segmentation Using Local and Global Aggregation [24.51617545483278]
We introduce a hierarchical transformer-based model designed for sophisticated image segmentation tasks. At the heart of our approach is a multi-level representation strategy, which systematically advances from individual pixels to superpixels. This architecture is underpinned by two pivotal aggregation strategies: local aggregation and global aggregation.
arXiv Detail & Related papers (2024-09-02T16:13:26Z)
Anno-incomplete Multi-dataset Detection [67.69438032767613]
We propose a novel problem as "-incomplete Multi-dataset Detection" We develop an end-to-end multi-task learning architecture which can accurately detect all the object categories with multiple partially annotated datasets.
arXiv Detail & Related papers (2024-08-29T03:58:21Z)
Learning Spatial-Semantic Features for Robust Video Object Segmentation [108.045326229865]
We propose a robust video object segmentation framework equipped with spatial-semantic features and discriminative object queries. We show that the proposed method set a new state-of-the-art performance on multiple datasets.
arXiv Detail & Related papers (2024-07-10T15:36:00Z)
1st Place Solution for MOSE Track in CVPR 2024 PVUW Workshop: Complex Video Object Segmentation [72.54357831350762]
We propose a semantic embedding video object segmentation model and use the salient features of objects as query representations. We trained our model on a large-scale video object segmentation dataset. Our model achieves first place (textbf84.45%) in the test set of Complex Video Object Challenge.
arXiv Detail & Related papers (2024-06-07T03:13:46Z)
Compositor: Bottom-up Clustering and Compositing for Robust Part and Object Segmentation [16.48046112716597]
We present a robust approach for joint part and object segmentation. We build a hierarchical feature representation including pixel, part, and object-level embeddings to solve it in a bottom-up manner. This bottom-up interaction is shown to be effective in integrating information from lower semantic levels to higher semantic levels.
arXiv Detail & Related papers (2023-06-12T20:12:02Z)
Towards Open-World Segmentation of Parts [16.056921233445784]
We propose to explore a class-agnostic part segmentation task. We argue that models trained without part classes can better localize parts and segment them on objects unseen in training. We show notable and consistent gains by our approach, essentially a critical step towards open-world part segmentation.
arXiv Detail & Related papers (2023-05-26T10:34:58Z)
MOSE: A New Dataset for Video Object Segmentation in Complex Scenes [106.64327718262764]
Video object segmentation (VOS) aims at segmenting a particular object throughout the entire video clip sequence. The state-of-the-art VOS methods have achieved excellent performance (e.g., 90+% J&F) on existing datasets. We collect a new VOS dataset called coMplex video Object SEgmentation (MOSE) to study the tracking and segmenting objects in complex environments.
arXiv Detail & Related papers (2023-02-03T17:20:03Z)
Associating Objects with Transformers for Video Object Segmentation [74.51719591192787]
We propose an Associating Objects with Transformers (AOT) approach to match and decode multiple objects uniformly. AOT employs an identification mechanism to associate multiple targets into the same high-dimensional embedding space. We ranked 1st in the 3rd Large-scale Video Object Challenge.
arXiv Detail & Related papers (2021-06-04T17:59:57Z)
Universal-Prototype Augmentation for Few-Shot Object Detection [128.4592084104352]
Few-shot object detection (FSOD) aims to strengthen the performance of novel object detection with few labeled samples. To alleviate the constraint of few samples, enhancing the generalization ability of learned features for novel objects plays a key role. We propose a new prototype, namely universal prototype, that is learned from all object categories.
arXiv Detail & Related papers (2021-03-01T15:35:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.