Compositor: Bottom-up Clustering and Compositing for Robust Part and
Object Segmentation
- URL: http://arxiv.org/abs/2306.07404v3
- Date: Thu, 30 Nov 2023 14:16:13 GMT
- Title: Compositor: Bottom-up Clustering and Compositing for Robust Part and
Object Segmentation
- Authors: Ju He, Jieneng Chen, Ming-Xian Lin, Qihang Yu, Alan Yuille
- Abstract summary: We present a robust approach for joint part and object segmentation.
We build a hierarchical feature representation including pixel, part, and object-level embeddings to solve it in a bottom-up manner.
This bottom-up interaction is shown to be effective in integrating information from lower semantic levels to higher semantic levels.
- Score: 16.48046112716597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we present a robust approach for joint part and object
segmentation. Specifically, we reformulate object and part segmentation as an
optimization problem and build a hierarchical feature representation including
pixel, part, and object-level embeddings to solve it in a bottom-up clustering
manner. Pixels are grouped into several clusters where the part-level
embeddings serve as cluster centers. Afterwards, object masks are obtained by
compositing the part proposals. This bottom-up interaction is shown to be
effective in integrating information from lower semantic levels to higher
semantic levels. Based on that, our novel approach Compositor produces part and
object segmentation masks simultaneously while improving the mask quality.
Compositor achieves state-of-the-art performance on PartImageNet and
Pascal-Part by outperforming previous methods by around 0.9% and 1.3% on
PartImageNet, 0.4% and 1.7% on Pascal-Part in terms of part and object mIoU and
demonstrates better robustness against occlusion by around 4.4% and 7.1% on
part and object respectively. Code will be available at
https://github.com/TACJu/Compositor.
Related papers
- From Pixels to Objects: A Hierarchical Approach for Part and Object Segmentation Using Local and Global Aggregation [24.51617545483278]
We introduce a hierarchical transformer-based model designed for sophisticated image segmentation tasks.
At the heart of our approach is a multi-level representation strategy, which systematically advances from individual pixels to superpixels.
This architecture is underpinned by two pivotal aggregation strategies: local aggregation and global aggregation.
arXiv Detail & Related papers (2024-09-02T16:13:26Z) - Completing Visual Objects via Bridging Generation and Segmentation [84.4552458720467]
MaskComp delineates the completion process through iterative stages of generation and segmentation.
In each iteration, the object mask is provided as an additional condition to boost image generation.
We demonstrate that the combination of one generation and one segmentation stage effectively functions as a mask denoiser.
arXiv Detail & Related papers (2023-10-01T22:25:40Z) - MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner
for Open-World Semantic Segmentation [110.09800389100599]
We propose MixReorg, a novel and straightforward pre-training paradigm for semantic segmentation.
Our approach involves generating fine-grained patch-text pairs data by mixing image patches while preserving the correspondence between patches and text.
With MixReorg as a mask learner, conventional text-supervised semantic segmentation models can achieve highly generalizable pixel-semantic alignment ability.
arXiv Detail & Related papers (2023-08-09T09:35:16Z) - Towards Open-World Segmentation of Parts [16.056921233445784]
We propose to explore a class-agnostic part segmentation task.
We argue that models trained without part classes can better localize parts and segment them on objects unseen in training.
We show notable and consistent gains by our approach, essentially a critical step towards open-world part segmentation.
arXiv Detail & Related papers (2023-05-26T10:34:58Z) - Multi-task Fusion for Efficient Panoptic-Part Segmentation [12.650574326251023]
We introduce a novel network that generates semantic, instance, and part segmentation using a shared encoder.
To fuse the predictions of all three heads efficiently, we introduce a parameter-free joint fusion module.
Our method is evaluated on the Cityscapes Panoptic Parts ( CPP) and Pascal Panoptic Parts (PPP) datasets.
arXiv Detail & Related papers (2022-12-15T09:04:45Z) - Discovering Object Masks with Transformers for Unsupervised Semantic
Segmentation [75.00151934315967]
MaskDistill is a novel framework for unsupervised semantic segmentation.
Our framework does not latch onto low-level image cues and is not limited to object-centric datasets.
arXiv Detail & Related papers (2022-06-13T17:59:43Z) - 3D Compositional Zero-shot Learning with DeCompositional Consensus [102.7571947144639]
We argue that part knowledge should be composable beyond the observed object classes.
We present 3D Compositional Zero-shot Learning as a problem of part generalization from seen to unseen object classes.
arXiv Detail & Related papers (2021-11-29T16:34:53Z) - Unsupervised Part Discovery from Contrastive Reconstruction [90.88501867321573]
The goal of self-supervised visual representation learning is to learn strong, transferable image representations.
We propose an unsupervised approach to object part discovery and segmentation.
Our method yields semantic parts consistent across fine-grained but visually distinct categories.
arXiv Detail & Related papers (2021-11-11T17:59:42Z) - Improving Semantic Segmentation via Decoupled Body and Edge Supervision [89.57847958016981]
Existing semantic segmentation approaches either aim to improve the object's inner consistency by modeling the global context, or refine objects detail along their boundaries by multi-scale feature fusion.
In this paper, a new paradigm for semantic segmentation is proposed.
Our insight is that appealing performance of semantic segmentation requires textitexplicitly modeling the object textitbody and textitedge, which correspond to the high and low frequency of the image.
We show that the proposed framework with various baselines or backbone networks leads to better object inner consistency and object boundaries.
arXiv Detail & Related papers (2020-07-20T12:11:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.