Mixed Supervised Object Detection by Transferring Mask Prior and
Semantic Similarity
- URL: http://arxiv.org/abs/2110.14191v1
- Date: Wed, 27 Oct 2021 05:43:09 GMT
- Title: Mixed Supervised Object Detection by Transferring Mask Prior and
Semantic Similarity
- Authors: Yan Liu, Zhijie Zhang, Li Niu, Junjie Chen, Liqing Zhang
- Abstract summary: We consider object detection with mixed supervision, which learns novel object categories using weak annotations.
We further transfer mask prior and semantic similarity to bridge the gap between novel categories and base categories.
Experimental results on three benchmark datasets demonstrate the effectiveness of our method over existing methods.
- Score: 22.706462533761986
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Object detection has achieved promising success, but requires large-scale
fully-annotated data, which is time-consuming and labor-extensive. Therefore,
we consider object detection with mixed supervision, which learns novel object
categories using weak annotations with the help of full annotations of existing
base object categories. Previous works using mixed supervision mainly learn the
class-agnostic objectness from fully-annotated categories, which can be
transferred to upgrade the weak annotations to pseudo full annotations for
novel categories. In this paper, we further transfer mask prior and semantic
similarity to bridge the gap between novel categories and base categories.
Specifically, the ability of using mask prior to help detect objects is learned
from base categories and transferred to novel categories. Moreover, the
semantic similarity between objects learned from base categories is transferred
to denoise the pseudo full annotations for novel categories. Experimental
results on three benchmark datasets demonstrate the effectiveness of our method
over existing methods. Codes are available at
https://github.com/bcmi/TraMaS-Weak-Shot-Object-Detection.
Related papers
- Open-Vocabulary Object Detection with Meta Prompt Representation and Instance Contrastive Optimization [63.66349334291372]
We propose a framework with Meta prompt and Instance Contrastive learning (MIC) schemes.
Firstly, we simulate a novel-class-emerging scenario to help the prompt that learns class and background prompts generalize to novel classes.
Secondly, we design an instance-level contrastive strategy to promote intra-class compactness and inter-class separation, which benefits generalization of the detector to novel class objects.
arXiv Detail & Related papers (2024-03-14T14:25:10Z) - Semantic-Promoted Debiasing and Background Disambiguation for Zero-Shot
Instance Segmentation [13.001629605405954]
Zero-shot instance segmentation aims to detect and precisely segment objects of unseen categories without any training samples.
We propose D$2$Zero with Semantic-Promoted Debiasing and Background Disambiguation.
Background disambiguation produces image-adaptive background representation to avoid mistaking novel objects for background.
arXiv Detail & Related papers (2023-05-22T16:00:01Z) - Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual
Mask Annotations [86.47908754383198]
Open-Vocabulary (OV) methods leverage large-scale image-caption pairs and vision-language models to learn novel categories.
Our method generates pseudo-mask annotations by leveraging the localization ability of a pre-trained vision-language model for objects present in image-caption pairs.
Our method trained with just pseudo-masks significantly improves the mAP scores on the MS-COCO dataset and OpenImages dataset.
arXiv Detail & Related papers (2023-03-29T17:58:39Z) - Learning Dense Object Descriptors from Multiple Views for Low-shot
Category Generalization [27.583517870047487]
We propose Deep Object Patch rimis (DOPE), which can be trained from multiple views of object instances without any category or semantic object part labels.
To train DOPE, we assume access to sparse depths, foreground masks and known cameras, to obtain pixel-level correspondences between views of an object.
We find that DOPE can directly be used for low-shot classification of novel categories using local-part matching, and is competitive with and outperforms supervised and self-supervised learning baselines.
arXiv Detail & Related papers (2022-11-28T04:31:53Z) - Exploiting Unlabeled Data with Vision and Language Models for Object
Detection [64.94365501586118]
Building robust and generic object detection frameworks requires scaling to larger label spaces and bigger training datasets.
We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images.
We demonstrate the value of the generated pseudo labels in two specific tasks, open-vocabulary detection and semi-supervised object detection.
arXiv Detail & Related papers (2022-07-18T21:47:15Z) - Few-Shot Object Detection: A Survey [4.266990593059534]
Few-shot object detection aims to learn from few object instances of new categories in the target domain.
We categorize approaches according to their training scheme and architectural layout.
We introduce commonly used datasets and their evaluation protocols and analyze reported benchmark results.
arXiv Detail & Related papers (2021-12-22T07:08:53Z) - Towards A Category-extended Object Detector without Relabeling or
Conflicts [40.714221493482974]
In this paper, we aim at leaning a strong unified detector that can handle all categories based on the limited datasets without extra manual labor.
We propose a practical framework which focuses on three aspects: better base model, better unlabeled ground-truth mining strategy and better retraining method with pseudo annotations.
arXiv Detail & Related papers (2020-12-28T06:44:53Z) - Closing the Generalization Gap in One-Shot Object Detection [92.82028853413516]
We show that the key to strong few-shot detection models may not lie in sophisticated metric learning approaches, but instead in scaling the number of categories.
Future data annotation efforts should therefore focus on wider datasets and annotate a larger number of categories.
arXiv Detail & Related papers (2020-11-09T09:31:17Z) - Dynamic Semantic Matching and Aggregation Network for Few-shot Intent
Detection [69.2370349274216]
Few-shot Intent Detection is challenging due to the scarcity of available annotated utterances.
Semantic components are distilled from utterances via multi-head self-attention.
Our method provides a comprehensive matching measure to enhance representations of both labeled and unlabeled instances.
arXiv Detail & Related papers (2020-10-06T05:16:38Z) - Cross-Supervised Object Detection [42.783400918552765]
We show how to build better object detectors from weakly labeled images of new categories by leveraging knowledge learned from fully labeled base categories.
We propose a unified framework that combines a detection head trained from instance-level annotations and a recognition head learned from image-level annotations.
arXiv Detail & Related papers (2020-06-26T15:33:48Z) - StarNet: towards Weakly Supervised Few-Shot Object Detection [87.80771067891418]
We introduce StarNet - a few-shot model featuring an end-to-end differentiable non-parametric star-model detection and classification head.
Through this head, the backbone is meta-trained using only image-level labels to produce good features for jointly localizing and classifying previously unseen categories of few-shot test tasks.
Being a few-shot detector, StarNet does not require any bounding box annotations, neither during pre-training nor for novel classes adaptation.
arXiv Detail & Related papers (2020-03-15T11:35:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.