From Pixels to Objects: A Hierarchical Approach for Part and Object Segmentation Using Local and Global Aggregation
- URL: http://arxiv.org/abs/2409.01353v1
- Date: Mon, 2 Sep 2024 16:13:26 GMT
- Title: From Pixels to Objects: A Hierarchical Approach for Part and Object Segmentation Using Local and Global Aggregation
- Authors: Yunfei Xie, Cihang Xie, Alan Yuille, Jieru Mei,
- Abstract summary: We introduce a hierarchical transformer-based model designed for sophisticated image segmentation tasks.
At the heart of our approach is a multi-level representation strategy, which systematically advances from individual pixels to superpixels.
This architecture is underpinned by two pivotal aggregation strategies: local aggregation and global aggregation.
- Score: 24.51617545483278
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce a hierarchical transformer-based model designed for sophisticated image segmentation tasks, effectively bridging the granularity of part segmentation with the comprehensive scope of object segmentation. At the heart of our approach is a multi-level representation strategy, which systematically advances from individual pixels to superpixels, and ultimately to cohesive group formations. This architecture is underpinned by two pivotal aggregation strategies: local aggregation and global aggregation. Local aggregation is employed to form superpixels, leveraging the inherent redundancy of the image data to produce segments closely aligned with specific parts of the object, guided by object-level supervision. In contrast, global aggregation interlinks these superpixels, organizing them into larger groups that correlate with entire objects and benefit from part-level supervision. This dual aggregation framework ensures a versatile adaptation to varying supervision inputs while maintaining computational efficiency. Our methodology notably improves the balance between adaptability across different supervision modalities and computational manageability, culminating in significant enhancement in segmentation performance. When tested on the PartImageNet dataset, our model achieves a substantial increase, outperforming the previous state-of-the-art by 2.8% and 0.8% in mIoU scores for part and object segmentation, respectively. Similarly, on the Pascal Part dataset, it records performance enhancements of 1.5% and 2.0% for part and object segmentation, respectively.
Related papers
- Pixel-Level Clustering Network for Unsupervised Image Segmentation [3.69853388955692]
We present a pixel-level clustering framework for segmenting images into regions without using ground truth annotations.
We also propose a training strategy that utilizes intra-consistency within each superpixel, inter-similarity/dissimilarity between neighboring superpixels, and structural similarity between images.
arXiv Detail & Related papers (2023-10-24T23:06:29Z) - Multi-interactive Feature Learning and a Full-time Multi-modality
Benchmark for Image Fusion and Segmentation [66.15246197473897]
Multi-modality image fusion and segmentation play a vital role in autonomous driving and robotic operation.
We propose a textbfMulti-textbfinteractive textbfFeature learning architecture for image fusion and textbfSegmentation.
arXiv Detail & Related papers (2023-08-04T01:03:58Z) - Compositor: Bottom-up Clustering and Compositing for Robust Part and
Object Segmentation [16.48046112716597]
We present a robust approach for joint part and object segmentation.
We build a hierarchical feature representation including pixel, part, and object-level embeddings to solve it in a bottom-up manner.
This bottom-up interaction is shown to be effective in integrating information from lower semantic levels to higher semantic levels.
arXiv Detail & Related papers (2023-06-12T20:12:02Z) - Neural Constraint Satisfaction: Hierarchical Abstraction for
Combinatorial Generalization in Object Rearrangement [75.9289887536165]
We present a hierarchical abstraction approach to uncover underlying entities.
We show how to learn a correspondence between intervening on states of entities in the agent's model and acting on objects in the environment.
We use this correspondence to develop a method for control that generalizes to different numbers and configurations of objects.
arXiv Detail & Related papers (2023-03-20T18:19:36Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - DuAT: Dual-Aggregation Transformer Network for Medical Image
Segmentation [21.717520350930705]
Transformer-based models have been widely demonstrated to be successful in computer vision tasks.
However, they are often dominated by features of large patterns leading to the loss of local details.
We propose a Dual-Aggregation Transformer Network called DuAT, which is characterized by two innovative designs.
Our proposed model outperforms state-of-the-art methods in the segmentation of skin lesion images, and polyps in colonoscopy images.
arXiv Detail & Related papers (2022-12-21T07:54:02Z) - Framework-agnostic Semantically-aware Global Reasoning for Segmentation [29.69187816377079]
We propose a component that learns to project image features into latent representations and reason between them.
Our design encourages the latent regions to represent semantic concepts by ensuring that the activated regions are spatially disjoint.
Our latent tokens are semantically interpretable and diverse and provide a rich set of features that can be transferred to downstream tasks.
arXiv Detail & Related papers (2022-12-06T21:42:05Z) - Scaling up Multi-domain Semantic Segmentation with Sentence Embeddings [81.09026586111811]
We propose an approach to semantic segmentation that achieves state-of-the-art supervised performance when applied in a zero-shot setting.
This is achieved by replacing each class label with a vector-valued embedding of a short paragraph that describes the class.
The resulting merged semantic segmentation dataset of over 2 Million images enables training a model that achieves performance equal to that of state-of-the-art supervised methods on 7 benchmark datasets.
arXiv Detail & Related papers (2022-02-04T07:19:09Z) - CoADNet: Collaborative Aggregation-and-Distribution Networks for
Co-Salient Object Detection [91.91911418421086]
Co-Salient Object Detection (CoSOD) aims at discovering salient objects that repeatedly appear in a given query group containing two or more relevant images.
One challenging issue is how to effectively capture co-saliency cues by modeling and exploiting inter-image relationships.
We present an end-to-end collaborative aggregation-and-distribution network (CoADNet) to capture both salient and repetitive visual patterns from multiple images.
arXiv Detail & Related papers (2020-11-10T04:28:11Z) - Visual Object Tracking by Segmentation with Graph Convolutional Network [7.729569666460712]
We propose to utilize graph convolutional network (GCN) model for superpixel based object tracking.
The proposed model provides a general end-to-end framework which integrates i) label linear prediction, and ii) structure-aware feature information of each superpixel together.
arXiv Detail & Related papers (2020-09-05T12:43:21Z) - Improving Semantic Segmentation via Decoupled Body and Edge Supervision [89.57847958016981]
Existing semantic segmentation approaches either aim to improve the object's inner consistency by modeling the global context, or refine objects detail along their boundaries by multi-scale feature fusion.
In this paper, a new paradigm for semantic segmentation is proposed.
Our insight is that appealing performance of semantic segmentation requires textitexplicitly modeling the object textitbody and textitedge, which correspond to the high and low frequency of the image.
We show that the proposed framework with various baselines or backbone networks leads to better object inner consistency and object boundaries.
arXiv Detail & Related papers (2020-07-20T12:11:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.