WeakSAM: Segment Anything Meets Weakly-supervised Instance-level
Recognition
- URL: http://arxiv.org/abs/2402.14812v1
- Date: Thu, 22 Feb 2024 18:59:24 GMT
- Title: WeakSAM: Segment Anything Meets Weakly-supervised Instance-level
Recognition
- Authors: Lianghui Zhu, Junwei Zhou, Yan Liu, Xin Hao, Wenyu Liu, Xinggang Wang
- Abstract summary: Weakly supervised visual recognition using inexact supervision is a critical yet challenging learning problem.
This paper introduces WeakSAM and solves the weakly-supervised object detection (WSOD) and segmentation by utilizing the pre-learned world knowledge contained in a vision foundation model, i.e., the Segment Anything Model (SAM)
Our results indicate that WeakSAM significantly surpasses previous state-of-the-art methods in WSOD and WSIS benchmarks with large margins, i.e. average improvements of 7.4% and 8.5%, respectively.
- Score: 40.711009448103354
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly supervised visual recognition using inexact supervision is a critical
yet challenging learning problem. It significantly reduces human labeling costs
and traditionally relies on multi-instance learning and pseudo-labeling. This
paper introduces WeakSAM and solves the weakly-supervised object detection
(WSOD) and segmentation by utilizing the pre-learned world knowledge contained
in a vision foundation model, i.e., the Segment Anything Model (SAM). WeakSAM
addresses two critical limitations in traditional WSOD retraining, i.e., pseudo
ground truth (PGT) incompleteness and noisy PGT instances, through adaptive PGT
generation and Region of Interest (RoI) drop regularization. It also addresses
the SAM's problems of requiring prompts and category unawareness for automatic
object detection and segmentation. Our results indicate that WeakSAM
significantly surpasses previous state-of-the-art methods in WSOD and WSIS
benchmarks with large margins, i.e. average improvements of 7.4% and 8.5%,
respectively. The code is available at \url{https://github.com/hustvl/WeakSAM}.
Related papers
- Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes [18.244508068200236]
Crowd-SAM is a framework designed to enhance SAM's performance in crowded and occluded scenes.
We introduce an efficient prompt sampler (EPS) and a part-whole discrimination network (PWD-Net) to enhance mask selection and accuracy in crowded scenes.
Crowd-SAM rivals state-of-the-art (SOTA) fully-supervised object detection methods on several benchmarks including CrowdHuman and CityPersons.
arXiv Detail & Related papers (2024-07-16T08:00:01Z) - SAM-PM: Enhancing Video Camouflaged Object Detection using Spatio-Temporal Attention [0.0]
The Segment Anything Model (SAM) has gained notable recognition for its exceptional performance in image segmentation.
Camouflaged objects typically blend into the background, making them difficult to distinguish in still images.
We propose a new method called the SAM Spider Module (SAM-PM) to overcome these challenges.
Our method effectively incorporates temporal consistency and domain-specific expertise into the segmentation network with an addition of less than 1% of SAM's parameters.
arXiv Detail & Related papers (2024-06-09T14:33:38Z) - AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning [61.666973416903005]
Segment Anything Model (SAM) has demonstrated its impressive generalization capabilities in open-world scenarios with the guidance of prompts.
We propose a novel framework, termed AlignSAM, designed for automatic prompting for aligning SAM to an open context.
arXiv Detail & Related papers (2024-06-01T16:21:39Z) - BLO-SAM: Bi-level Optimization Based Overfitting-Preventing Finetuning
of SAM [37.1263294647351]
We introduce BLO-SAM, which finetunes the Segment Anything Model (SAM) based on bi-level optimization (BLO)
BLO-SAM reduces the risk of overfitting by training the model's weight parameters and the prompt embedding on two separate subsets of the training dataset.
Results demonstrate BLO-SAM's superior performance over various state-of-the-art image semantic segmentation methods.
arXiv Detail & Related papers (2024-02-26T06:36:32Z) - Stable Segment Anything Model [79.9005670886038]
The Segment Anything Model (SAM) achieves remarkable promptable segmentation given high-quality prompts.
This paper presents the first comprehensive analysis on SAM's segmentation stability across a diverse spectrum of prompt qualities.
Our solution, termed Stable-SAM, offers several advantages: 1) improved SAM's segmentation stability across a wide range of prompt qualities, while 2) retaining SAM's powerful promptable segmentation efficiency and generality.
arXiv Detail & Related papers (2023-11-27T12:51:42Z) - ImbSAM: A Closer Look at Sharpness-Aware Minimization in
Class-Imbalanced Recognition [62.20538402226608]
We show that the Sharpness-Aware Minimization (SAM) fails to address generalization issues under the class-imbalanced setting.
We propose a class-aware smoothness optimization algorithm named Imbalanced-SAM (ImbSAM) to overcome this bottleneck.
Our ImbSAM demonstrates remarkable performance improvements for tail classes and anomaly.
arXiv Detail & Related papers (2023-08-15T14:46:32Z) - SAM Meets Robotic Surgery: An Empirical Study on Generalization,
Robustness and Adaptation [15.995869434429274]
The Segment Anything Model (SAM) serves as a fundamental model for semantic segmentation.
We examine SAM's robustness and zero-shot generalizability in the field of robotic surgery.
arXiv Detail & Related papers (2023-08-14T14:09:41Z) - On the Robustness of Segment Anything [46.669794757467166]
We aim to study the testing-time robustness of SAM under adversarial scenarios and common corruptions.
We find that SAM exhibits remarkable robustness against various corruptions, except for blur-related corruption.
arXiv Detail & Related papers (2023-05-25T16:28:30Z) - Weakly Supervised Person Search with Region Siamese Networks [65.76237418040071]
Supervised learning is dominant in person search, but it requires elaborate labeling of bounding boxes and identities.
We present a weakly supervised setting where only bounding box annotations are available.
Our model achieves the rank-1 of 87.1% and mAP of 86.0% on CUHK-SYSU benchmark.
arXiv Detail & Related papers (2021-09-13T16:33:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.