Cut and Learn for Unsupervised Object Detection and Instance
Segmentation
- URL: http://arxiv.org/abs/2301.11320v1
- Date: Thu, 26 Jan 2023 18:57:13 GMT
- Title: Cut and Learn for Unsupervised Object Detection and Instance
Segmentation
- Authors: Xudong Wang and Rohit Girdhar and Stella X. Yu and Ishan Misra
- Abstract summary: Cut-and-LEaRn (CutLER) is a simple approach for training unsupervised object detection and segmentation models.
CutLER is a zero-shot unsupervised detector and improves detection performance AP50 by over 2.7 times on 11 benchmarks.
- Score: 65.43627672225624
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose Cut-and-LEaRn (CutLER), a simple approach for training
unsupervised object detection and segmentation models. We leverage the property
of self-supervised models to 'discover' objects without supervision and amplify
it to train a state-of-the-art localization model without any human labels.
CutLER first uses our proposed MaskCut approach to generate coarse masks for
multiple objects in an image and then learns a detector on these masks using
our robust loss function. We further improve the performance by self-training
the model on its predictions. Compared to prior work, CutLER is simpler,
compatible with different detection architectures, and detects multiple
objects. CutLER is also a zero-shot unsupervised detector and improves
detection performance AP50 by over 2.7 times on 11 benchmarks across domains
like video frames, paintings, sketches, etc. With finetuning, CutLER serves as
a low-shot detector surpassing MoCo-v2 by 7.3% APbox and 6.6% APmask on COCO
when training with 5% labels.
Related papers
- Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines.
It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module.
Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z) - Zero-Shot Edge Detection with SCESAME: Spectral Clustering-based
Ensemble for Segment Anything Model Estimation [0.0]
This paper proposes a novel zero-shot edge detection with SCESAME, based on the recently proposed Segment Anything Model (SAM)
AMG can be applied to edge detection, but suffers from the problem of overdetecting edges.
We performed edge detection experiments on two datasets, BSDS500 and NYUDv2.
arXiv Detail & Related papers (2023-08-26T06:19:59Z) - Semi-Supervised and Long-Tailed Object Detection with CascadeMatch [91.86787064083012]
We propose a novel pseudo-labeling-based detector called CascadeMatch.
Our detector features a cascade network architecture, which has multi-stage detection heads with progressive confidence thresholds.
We show that CascadeMatch surpasses existing state-of-the-art semi-supervised approaches in handling long-tailed object detection.
arXiv Detail & Related papers (2023-05-24T07:09:25Z) - A Tri-Layer Plugin to Improve Occluded Detection [100.99802831241583]
We propose a simple '' module for the detection head of two-stage object detectors to improve the recall of partially occluded objects.
The module predicts a tri-layer of segmentation masks for the target object, the occluder and the occludee, and by doing so is able to better predict the mask of the target object.
We also establish a COCO evaluation dataset to measure the recall performance of partially occluded and separated objects.
arXiv Detail & Related papers (2022-10-18T17:59:51Z) - MOVE: Unsupervised Movable Object Segmentation and Detection [32.73565093619594]
MOVE is a method to segment objects without any form of supervision.
It exploits the fact that foreground objects can be shifted locally relative to their initial position.
It gives an average CorLoc improvement of 7.2% over the SotA.
arXiv Detail & Related papers (2022-10-14T16:05:46Z) - ZSD-YOLO: Zero-Shot YOLO Detection using Vision-Language
KnowledgeDistillation [5.424015823818208]
A dataset such as COCO is extensively annotated across many images but with a sparse number of categories and annotating all object classes across a diverse domain is expensive and challenging.
We develop a Vision-Language distillation method that aligns both image and text embeddings from a zero-shot pre-trained model such as CLIP to a modified semantic prediction head from a one-stage detector like YOLOv5.
During inference, our model can be adapted to detect any number of object classes without additional training.
arXiv Detail & Related papers (2021-09-24T16:46:36Z) - Custom Object Detection via Multi-Camera Self-Supervised Learning [15.286868970188223]
MCSSL is a self-supervised learning approach for building custom object detection models in multi-camera networks.
Our evaluation shows that compared with legacy selftraining methods, MCSSL improves average mAP by 5.44% and 6.76% on WildTrack and CityFlow dataset.
arXiv Detail & Related papers (2021-02-05T23:11:14Z) - Weakly-Supervised Saliency Detection via Salient Object Subitizing [57.17613373230722]
We introduce saliency subitizing as the weak supervision since it is class-agnostic.
This allows the supervision to be aligned with the property of saliency detection.
We conduct extensive experiments on five benchmark datasets.
arXiv Detail & Related papers (2021-01-04T12:51:45Z) - Detection in Crowded Scenes: One Proposal, Multiple Predictions [79.28850977968833]
We propose a proposal-based object detector, aiming at detecting highly-overlapped instances in crowded scenes.
The key of our approach is to let each proposal predict a set of correlated instances rather than a single one in previous proposal-based frameworks.
Our detector can obtain 4.9% AP gains on challenging CrowdHuman dataset and 1.0% $textMR-2$ improvements on CityPersons dataset.
arXiv Detail & Related papers (2020-03-20T09:48:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.