WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition
- URL: http://arxiv.org/abs/2402.14812v2
- Date: Sat, 17 Aug 2024 04:55:22 GMT
- Title: WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition
- Authors: Lianghui Zhu, Junwei Zhou, Yan Liu, Xin Hao, Wenyu Liu, Xinggang Wang,
- Abstract summary: Weakly supervised visual recognition using inexact supervision is a critical yet challenging learning problem.
This paper introduces WeakSAM and solves the weakly-supervised object detection (WSOD) and segmentation by utilizing the pre-learned world knowledge contained in a vision foundation model, i.e., the Segment Anything Model (SAM)
Our results indicate that WeakSAM significantly surpasses previous state-of-the-art methods in WSOD and WSIS benchmarks with large margins, i.e. average improvements of 7.4% and 8.5%, respectively.
- Score: 38.42053754669399
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly supervised visual recognition using inexact supervision is a critical yet challenging learning problem. It significantly reduces human labeling costs and traditionally relies on multi-instance learning and pseudo-labeling. This paper introduces WeakSAM and solves the weakly-supervised object detection (WSOD) and segmentation by utilizing the pre-learned world knowledge contained in a vision foundation model, i.e., the Segment Anything Model (SAM). WeakSAM addresses two critical limitations in traditional WSOD retraining, i.e., pseudo ground truth (PGT) incompleteness and noisy PGT instances, through adaptive PGT generation and Region of Interest (RoI) drop regularization. It also addresses the SAM's problems of requiring prompts and category unawareness for automatic object detection and segmentation. Our results indicate that WeakSAM significantly surpasses previous state-of-the-art methods in WSOD and WSIS benchmarks with large margins, i.e. average improvements of 7.4% and 8.5%, respectively. The code is available at \url{https://github.com/hustvl/WeakSAM}.
Related papers
- Adapting Segment Anything Model for Unseen Object Instance Segmentation [70.60171342436092]
Unseen Object Instance (UOIS) is crucial for autonomous robots operating in unstructured environments.
We propose UOIS-SAM, a data-efficient solution for the UOIS task.
UOIS-SAM integrates two key components: (i) a Heatmap-based Prompt Generator (HPG) to generate class-agnostic point prompts with precise foreground prediction, and (ii) a Hierarchical Discrimination Network (HDNet) that adapts SAM's mask decoder.
arXiv Detail & Related papers (2024-09-23T19:05:50Z) - Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes [18.244508068200236]
Crowd-SAM is a framework designed to enhance SAM's performance in crowded and occluded scenes.
We introduce an efficient prompt sampler (EPS) and a part-whole discrimination network (PWD-Net) to enhance mask selection and accuracy in crowded scenes.
Crowd-SAM rivals state-of-the-art (SOTA) fully-supervised object detection methods on several benchmarks including CrowdHuman and CityPersons.
arXiv Detail & Related papers (2024-07-16T08:00:01Z) - SAM-PM: Enhancing Video Camouflaged Object Detection using Spatio-Temporal Attention [0.0]
The Segment Anything Model (SAM) has gained notable recognition for its exceptional performance in image segmentation.
Camouflaged objects typically blend into the background, making them difficult to distinguish in still images.
We propose a new method called the SAM Spider Module (SAM-PM) to overcome these challenges.
Our method effectively incorporates temporal consistency and domain-specific expertise into the segmentation network with an addition of less than 1% of SAM's parameters.
arXiv Detail & Related papers (2024-06-09T14:33:38Z) - AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning [61.666973416903005]
Segment Anything Model (SAM) has demonstrated its impressive generalization capabilities in open-world scenarios with the guidance of prompts.
We propose a novel framework, termed AlignSAM, designed for automatic prompting for aligning SAM to an open context.
arXiv Detail & Related papers (2024-06-01T16:21:39Z) - BLO-SAM: Bi-level Optimization Based Overfitting-Preventing Finetuning
of SAM [37.1263294647351]
We introduce BLO-SAM, which finetunes the Segment Anything Model (SAM) based on bi-level optimization (BLO)
BLO-SAM reduces the risk of overfitting by training the model's weight parameters and the prompt embedding on two separate subsets of the training dataset.
Results demonstrate BLO-SAM's superior performance over various state-of-the-art image semantic segmentation methods.
arXiv Detail & Related papers (2024-02-26T06:36:32Z) - ImbSAM: A Closer Look at Sharpness-Aware Minimization in
Class-Imbalanced Recognition [62.20538402226608]
We show that the Sharpness-Aware Minimization (SAM) fails to address generalization issues under the class-imbalanced setting.
We propose a class-aware smoothness optimization algorithm named Imbalanced-SAM (ImbSAM) to overcome this bottleneck.
Our ImbSAM demonstrates remarkable performance improvements for tail classes and anomaly.
arXiv Detail & Related papers (2023-08-15T14:46:32Z) - SAM Meets Robotic Surgery: An Empirical Study on Generalization,
Robustness and Adaptation [15.995869434429274]
The Segment Anything Model (SAM) serves as a fundamental model for semantic segmentation.
We examine SAM's robustness and zero-shot generalizability in the field of robotic surgery.
arXiv Detail & Related papers (2023-08-14T14:09:41Z) - On the Robustness of Segment Anything [46.669794757467166]
We aim to study the testing-time robustness of SAM under adversarial scenarios and common corruptions.
We find that SAM exhibits remarkable robustness against various corruptions, except for blur-related corruption.
arXiv Detail & Related papers (2023-05-25T16:28:30Z) - Weakly Supervised Person Search with Region Siamese Networks [65.76237418040071]
Supervised learning is dominant in person search, but it requires elaborate labeling of bounding boxes and identities.
We present a weakly supervised setting where only bounding box annotations are available.
Our model achieves the rank-1 of 87.1% and mAP of 86.0% on CUHK-SYSU benchmark.
arXiv Detail & Related papers (2021-09-13T16:33:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.