Semantic-SAM: Segment and Recognize Anything at Any Granularity
- URL: http://arxiv.org/abs/2307.04767v1
- Date: Mon, 10 Jul 2023 17:59:40 GMT
- Title: Semantic-SAM: Segment and Recognize Anything at Any Granularity
- Authors: Feng Li, Hao Zhang, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang,
Chunyuan Li, Lei Zhang, Jianfeng Gao
- Abstract summary: We introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity.
We consolidate multiple datasets across three granularities and introduce decoupled classification for objects and parts.
For the multi-granularity capability, we propose a multi-choice learning scheme during training, enabling each click to generate masks at multiple levels.
- Score: 83.64686655044765
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce Semantic-SAM, a universal image segmentation
model to enable segment and recognize anything at any desired granularity. Our
model offers two key advantages: semantic-awareness and granularity-abundance.
To achieve semantic-awareness, we consolidate multiple datasets across three
granularities and introduce decoupled classification for objects and parts.
This allows our model to capture rich semantic information. For the
multi-granularity capability, we propose a multi-choice learning scheme during
training, enabling each click to generate masks at multiple levels that
correspond to multiple ground-truth masks. Notably, this work represents the
first attempt to jointly train a model on SA-1B, generic, and part segmentation
datasets. Experimental results and visualizations demonstrate that our model
successfully achieves semantic-awareness and granularity-abundance.
Furthermore, combining SA-1B training with other segmentation tasks, such as
panoptic and part segmentation, leads to performance improvements. We will
provide code and a demo for further exploration and evaluation.
Related papers
- Instruction-guided Multi-Granularity Segmentation and Captioning with Large Multimodal Model [19.861556031795725]
We introduce a Multi-Granularity Large Multimodal Model (MGLMM)
MGLMM is capable of seamlessly adjusting the granularity of Captioning (SegCap) following user instructions.
It excels at tackling more than eight downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-09-20T11:13:31Z) - PosSAM: Panoptic Open-vocabulary Segment Anything [58.72494640363136]
PosSAM is an open-vocabulary panoptic segmentation model that unifies the strengths of the Segment Anything Model (SAM) with the vision-native CLIP model in an end-to-end framework.
We introduce a Mask-Aware Selective Ensembling (MASE) algorithm that adaptively enhances the quality of generated masks and boosts the performance of open-vocabulary classification during inference for each image.
arXiv Detail & Related papers (2024-03-14T17:55:03Z) - Joint Depth Prediction and Semantic Segmentation with Multi-View SAM [59.99496827912684]
We propose a Multi-View Stereo (MVS) technique for depth prediction that benefits from rich semantic features of the Segment Anything Model (SAM)
This enhanced depth prediction, in turn, serves as a prompt to our Transformer-based semantic segmentation decoder.
arXiv Detail & Related papers (2023-10-31T20:15:40Z) - RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation.
Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal.
We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z) - AIMS: All-Inclusive Multi-Level Segmentation [93.5041381700744]
We propose a new task, All-Inclusive Multi-Level (AIMS), which segments visual regions into three levels: part, entity, and relation.
We also build a unified AIMS model through multi-dataset multi-task training to address the two major challenges of annotation inconsistency and task correlation.
arXiv Detail & Related papers (2023-05-28T16:28:49Z) - Weakly-Supervised Concealed Object Segmentation with SAM-based Pseudo
Labeling and Multi-scale Feature Grouping [40.07070188661184]
Weakly-Supervised Concealed Object (WSCOS) aims to segment objects well blended with surrounding environments.
It is hard to distinguish concealed objects from the background due to the intrinsic similarity.
We propose a new WSCOS method to address these two challenges.
arXiv Detail & Related papers (2023-05-18T14:31:34Z) - MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains.
We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images.
A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.