The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models
- URL: http://arxiv.org/abs/2404.11957v1
- Date: Thu, 18 Apr 2024 07:22:38 GMT
- Title: The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models
- Authors: Cheng Shi, Sibei Yang,
- Abstract summary: In object detection and instance segmentation, foundation models such as SAM and DINO struggle to achieve satisfactory performance.
We propose $textbfZip$ which $textbfZ$ips up CL$textbfip$ and SAM in a novel classification-first-then-discovery pipeline.
Our Zip significantly boosts SAM's mask AP on COCO dataset by 12.5% and establishes state-of-the-art performance in various settings.
- Score: 24.53385855664792
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Foundation models, pre-trained on a large amount of data have demonstrated impressive zero-shot capabilities in various downstream tasks. However, in object detection and instance segmentation, two fundamental computer vision tasks heavily reliant on extensive human annotations, foundation models such as SAM and DINO struggle to achieve satisfactory performance. In this study, we reveal that the devil is in the object boundary, \textit{i.e.}, these foundation models fail to discern boundaries between individual objects. For the first time, we probe that CLIP, which has never accessed any instance-level annotations, can provide a highly beneficial and strong instance-level boundary prior in the clustering results of its particular intermediate layer. Following this surprising observation, we propose $\textbf{Zip}$ which $\textbf{Z}$ips up CL$\textbf{ip}$ and SAM in a novel classification-first-then-discovery pipeline, enabling annotation-free, complex-scene-capable, open-vocabulary object detection and instance segmentation. Our Zip significantly boosts SAM's mask AP on COCO dataset by 12.5% and establishes state-of-the-art performance in various settings, including training-free, self-training, and label-efficient finetuning. Furthermore, annotation-free Zip even achieves comparable performance to the best-performing open-vocabulary object detecters using base annotations. Code is released at https://github.com/ChengShiest/Zip-Your-CLIP
Related papers
- FMG-Det: Foundation Model Guided Robust Object Detection [7.489718044485341]
Training on noisy annotations significantly degrades detector performance.<n>We propose -Det, a simple, efficient methodology for training models with noisy annotations.
arXiv Detail & Related papers (2025-05-29T17:55:41Z) - S$^2$Teacher: Step-by-step Teacher for Sparsely Annotated Oriented Object Detection [55.34086214300803]
We introduce a novel setting called sparsely annotated object detection (SAOOD), which only labels partial instances.
Specifically, we focus on two key issues in the setting: (1) sparse labeling leading to overfitting on limited foreground representations, and (2) unlabeled objects (false negatives) confusing feature learning.
To this end, we propose the S$2$Teacher, a novel method that progressively mines pseudo-labels for unlabeled objects, from easy to hard, to enhance foreground representations.
arXiv Detail & Related papers (2025-04-15T11:57:00Z) - BAISeg: Boundary Assisted Weakly Supervised Instance Segmentation [9.6046915661065]
How to extract instance-level masks without instance-level supervision is the main challenge of weakly supervised instance segmentation (WSIS)
Popular WSIS methods estimate a displacement field (DF) via learning inter-pixel relations and perform clustering to identify instances.
We propose Boundary-Assisted Instance (BAISeg), which is a novel paradigm for WSIS that realizes instance segmentation with pixel-level annotations.
arXiv Detail & Related papers (2024-05-27T15:14:09Z) - Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS)
We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes.
By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z) - Improved Region Proposal Network for Enhanced Few-Shot Object Detection [23.871860648919593]
Few-shot object detection (FSOD) methods have emerged as a solution to the limitations of classic object detection approaches.
We develop a semi-supervised algorithm to detect and then utilize unlabeled novel objects as positive samples during the FSOD training stage.
Our improved hierarchical sampling strategy for the region proposal network (RPN) also boosts the perception of the object detection model for large objects.
arXiv Detail & Related papers (2023-08-15T02:35:59Z) - Sparse Instance Activation for Real-Time Instance Segmentation [72.23597664935684]
We propose a conceptually novel, efficient, and fully convolutional framework for real-time instance segmentation.
SparseInst has extremely fast inference speed and achieves 40 FPS and 37.9 AP on the COCO benchmark.
arXiv Detail & Related papers (2022-03-24T03:15:39Z) - FreeSOLO: Learning to Segment Objects without Annotations [191.82134817449528]
We present FreeSOLO, a self-supervised instance segmentation framework built on top of the simple instance segmentation method SOLO.
Our method also presents a novel localization-aware pre-training framework, where objects can be discovered from complicated scenes in an unsupervised manner.
arXiv Detail & Related papers (2022-02-24T16:31:44Z) - Point Cloud Instance Segmentation with Semi-supervised Bounding-Box
Mining [17.69745159912481]
We introduce the first semi-supervised point cloud instance segmentation framework (SPIB) using both labeled and unlabelled bounding boxes as supervision.
Our method can achieve competitive performance compared with the recent fully-supervised methods.
arXiv Detail & Related papers (2021-11-30T08:40:40Z) - WSSOD: A New Pipeline for Weakly- and Semi-Supervised Object Detection [75.80075054706079]
We propose a weakly- and semi-supervised object detection framework (WSSOD)
An agent detector is first trained on a joint dataset and then used to predict pseudo bounding boxes on weakly-annotated images.
The proposed framework demonstrates remarkable performance on PASCAL-VOC and MSCOCO benchmark, achieving a high performance comparable to those obtained in fully-supervised settings.
arXiv Detail & Related papers (2021-05-21T11:58:50Z) - UniT: Unified Knowledge Transfer for Any-shot Object Detection and
Segmentation [52.487469544343305]
Methods for object detection and segmentation rely on large scale instance-level annotations for training.
We propose an intuitive and unified semi-supervised model that is applicable to a range of supervision.
arXiv Detail & Related papers (2020-06-12T22:45:47Z) - Weakly-Supervised Salient Object Detection via Scribble Annotations [54.40518383782725]
We propose a weakly-supervised salient object detection model to learn saliency from scribble labels.
We present a new metric, termed saliency structure measure, to measure the structure alignment of the predicted saliency maps.
Our method not only outperforms existing weakly-supervised/unsupervised methods, but also is on par with several fully-supervised state-of-the-art models.
arXiv Detail & Related papers (2020-03-17T12:59:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.