FLOAT: Factorized Learning of Object Attributes for Improved
Multi-object Multi-part Scene Parsing
- URL: http://arxiv.org/abs/2203.16168v1
- Date: Wed, 30 Mar 2022 09:46:10 GMT
- Title: FLOAT: Factorized Learning of Object Attributes for Improved
Multi-object Multi-part Scene Parsing
- Authors: Rishubh Singh, Pranav Gupta, Pradeep Shenoy and Ravikiran
Sarvadevabhatla
- Abstract summary: We propose FLOAT, a factorized label space framework for scalable multi-object multi-part parsing.
Our framework involves independent dense prediction of object category and part attributes.
In addition, we propose an inference-time 'zoom' refinement technique which significantly improves segmentation quality.
- Score: 10.94244766491706
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-object multi-part scene parsing is a challenging task which requires
detecting multiple object classes in a scene and segmenting the semantic parts
within each object. In this paper, we propose FLOAT, a factorized label space
framework for scalable multi-object multi-part parsing. Our framework involves
independent dense prediction of object category and part attributes which
increases scalability and reduces task complexity compared to the monolithic
label space counterpart. In addition, we propose an inference-time 'zoom'
refinement technique which significantly improves segmentation quality,
especially for smaller objects/parts. Compared to state of the art, FLOAT
obtains an absolute improvement of 2.0% for mean IOU (mIOU) and 4.8% for
segmentation quality IOU (sqIOU) on the Pascal-Part-58 dataset. For the larger
Pascal-Part-108 dataset, the improvements are 2.1% for mIOU and 3.9% for sqIOU.
We incorporate previously excluded part attributes and other minor parts of the
Pascal-Part dataset to create the most comprehensive and challenging version
which we dub Pascal-Part-201. FLOAT obtains improvements of 8.6% for mIOU and
7.5% for sqIOU on the new dataset, demonstrating its parsing effectiveness
across a challenging diversity of objects and parts. The code and datasets are
available at floatseg.github.io.
Related papers
- Learning Spatial-Semantic Features for Robust Video Object Segmentation [108.045326229865]
We propose a robust video object segmentation framework equipped with spatial-semantic features and discriminative object queries.
We show that the proposed method set a new state-of-the-art performance on multiple datasets.
arXiv Detail & Related papers (2024-07-10T15:36:00Z) - 1st Place Solution for MOSE Track in CVPR 2024 PVUW Workshop: Complex Video Object Segmentation [72.54357831350762]
We propose a semantic embedding video object segmentation model and use the salient features of objects as query representations.
We trained our model on a large-scale video object segmentation dataset.
Our model achieves first place (textbf84.45%) in the test set of Complex Video Object Challenge.
arXiv Detail & Related papers (2024-06-07T03:13:46Z) - Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines.
It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module.
Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z) - Compositor: Bottom-up Clustering and Compositing for Robust Part and
Object Segmentation [16.48046112716597]
We present a robust approach for joint part and object segmentation.
We build a hierarchical feature representation including pixel, part, and object-level embeddings to solve it in a bottom-up manner.
This bottom-up interaction is shown to be effective in integrating information from lower semantic levels to higher semantic levels.
arXiv Detail & Related papers (2023-06-12T20:12:02Z) - Towards Open-World Segmentation of Parts [16.056921233445784]
We propose to explore a class-agnostic part segmentation task.
We argue that models trained without part classes can better localize parts and segment them on objects unseen in training.
We show notable and consistent gains by our approach, essentially a critical step towards open-world part segmentation.
arXiv Detail & Related papers (2023-05-26T10:34:58Z) - MOSE: A New Dataset for Video Object Segmentation in Complex Scenes [106.64327718262764]
Video object segmentation (VOS) aims at segmenting a particular object throughout the entire video clip sequence.
The state-of-the-art VOS methods have achieved excellent performance (e.g., 90+% J&F) on existing datasets.
We collect a new VOS dataset called coMplex video Object SEgmentation (MOSE) to study the tracking and segmenting objects in complex environments.
arXiv Detail & Related papers (2023-02-03T17:20:03Z) - Associating Objects with Transformers for Video Object Segmentation [74.51719591192787]
We propose an Associating Objects with Transformers (AOT) approach to match and decode multiple objects uniformly.
AOT employs an identification mechanism to associate multiple targets into the same high-dimensional embedding space.
We ranked 1st in the 3rd Large-scale Video Object Challenge.
arXiv Detail & Related papers (2021-06-04T17:59:57Z) - Universal-Prototype Augmentation for Few-Shot Object Detection [128.4592084104352]
Few-shot object detection (FSOD) aims to strengthen the performance of novel object detection with few labeled samples.
To alleviate the constraint of few samples, enhancing the generalization ability of learned features for novel objects plays a key role.
We propose a new prototype, namely universal prototype, that is learned from all object categories.
arXiv Detail & Related papers (2021-03-01T15:35:36Z) - Robust Instance Segmentation through Reasoning about Multi-Object
Occlusion [9.536947328412198]
We propose a deep network for multi-object instance segmentation that is robust to occlusion.
Our work builds on Compositional Networks, which learn a generative model of neural feature activations to locate occluders.
In particular, we obtain feed-forward predictions of the object classes and their instance and occluder segmentations.
arXiv Detail & Related papers (2020-12-03T17:41:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.