Revisiting the Sibling Head in Object Detector
- URL: http://arxiv.org/abs/2003.07540v1
- Date: Tue, 17 Mar 2020 05:21:54 GMT
- Title: Revisiting the Sibling Head in Object Detector
- Authors: Guanglu Song, Yu Liu, Xiaogang Wang
- Abstract summary: This paper provides the observation that the spatial misalignment between the two object functions in the sibling head can considerably hurt the training process.
Considering the classification and regression, TSD decouples them from the spatial dimension by generating two disentangled proposals for them.
Surprisingly, this simple design can boost all backbones and models on both MS COCO and Google OpenImage consistently by 3% mAP.
- Score: 24.784483589579896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ``shared head for classification and localization'' (sibling head),
firstly denominated in Fast RCNN~\cite{girshick2015fast}, has been leading the
fashion of the object detection community in the past five years. This paper
provides the observation that the spatial misalignment between the two object
functions in the sibling head can considerably hurt the training process, but
this misalignment can be resolved by a very simple operator called task-aware
spatial disentanglement (TSD). Considering the classification and regression,
TSD decouples them from the spatial dimension by generating two disentangled
proposals for them, which are estimated by the shared proposal. This is
inspired by the natural insight that for one instance, the features in some
salient area may have rich information for classification while these around
the boundary may be good at bounding box regression. Surprisingly, this simple
design can boost all backbones and models on both MS COCO and Google OpenImage
consistently by ~3% mAP. Further, we propose a progressive constraint to
enlarge the performance margin between the disentangled and the shared
proposals, and gain ~1% more mAP. We show the \algname{} breaks through the
upper bound of nowadays single-model detector by a large margin (mAP 49.4 with
ResNet-101, 51.2 with SENet154), and is the core model of our 1st place
solution on the Google OpenImage Challenge 2019.
Related papers
- Not Just Learning from Others but Relying on Yourself: A New Perspective
on Few-Shot Segmentation in Remote Sensing [14.37799301656178]
Few-shot segmentation (FSS) is proposed to segment unknown class targets with just a few annotated samples.
We develop a Dual-Mining network named DMNet for cross-image mining and self-mining.
Our model with the backbone of Resnet-50 achieves the mIoU of 49.58% and 51.34% on iSAID under 1-shot and 5-shot settings.
arXiv Detail & Related papers (2023-10-19T04:09:10Z) - Spatial-Aware Token for Weakly Supervised Object Localization [137.0570026552845]
We propose a task-specific spatial-aware token to condition localization in a weakly supervised manner.
Experiments show that the proposed SAT achieves state-of-the-art performance on both CUB-200 and ImageNet, with 98.45% and 73.13% GT-known Loc.
arXiv Detail & Related papers (2023-03-18T15:38:17Z) - Boosting Few-shot Fine-grained Recognition with Background Suppression
and Foreground Alignment [53.401889855278704]
Few-shot fine-grained recognition (FS-FGR) aims to recognize novel fine-grained categories with the help of limited available samples.
We propose a two-stage background suppression and foreground alignment framework, which is composed of a background activation suppression (BAS) module, a foreground object alignment (FOA) module, and a local to local (L2L) similarity metric.
Experiments conducted on multiple popular fine-grained benchmarks demonstrate that our method outperforms the existing state-of-the-art by a large margin.
arXiv Detail & Related papers (2022-10-04T07:54:40Z) - Group R-CNN for Weakly Semi-supervised Object Detection with Points [18.720915213798623]
We propose an effective point-to-box regressor: Group R-CNN.
Group R-CNN first uses instance-level proposal grouping to generate a group of proposals for each point annotation.
We show that Group R-CNN significantly outperforms the prior method Point DETR by 3.9 mAP with 5% well-labeled images.
arXiv Detail & Related papers (2022-05-12T07:17:54Z) - Anchor Retouching via Model Interaction for Robust Object Detection in
Aerial Images [15.404024559652534]
We present an effective Dynamic Enhancement Anchor (DEA) network to construct a novel training sample generator.
Our method achieves state-of-the-art performance in accuracy with moderate inference speed and computational overhead for training.
arXiv Detail & Related papers (2021-12-13T14:37:20Z) - Denoised Non-Local Neural Network for Semantic Segmentation [18.84185406522064]
We propose a Denoised Non-Local Network (Denoised NL) to eliminate the inter-class and intra-class noises respectively.
Our proposed NL can achieve the state-of-the-art performance of 83.5% and 46.69% mIoU on Cityscapes and ADE20K, respectively.
arXiv Detail & Related papers (2021-10-27T06:16:31Z) - Efficient Person Search: An Anchor-Free Approach [86.45858994806471]
Person search aims to simultaneously localize and identify a query person from realistic, uncropped images.
To achieve this goal, state-of-the-art models typically add a re-id branch upon two-stage detectors like Faster R-CNN.
In this work, we present an anchor-free approach to efficiently tackling this challenging task, by introducing the following dedicated designs.
arXiv Detail & Related papers (2021-09-01T07:01:33Z) - Scope Head for Accurate Localization in Object Detection [135.9979405835606]
We propose a novel detector coined as ScopeNet, which models anchors of each location as a mutually dependent relationship.
With our concise and effective design, the proposed ScopeNet achieves state-of-the-art results on COCO.
arXiv Detail & Related papers (2020-05-11T04:00:09Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z) - 1st Place Solutions for OpenImage2019 -- Object Detection and Instance
Segmentation [116.25081559037872]
This article introduces the solutions of the two champion teams, MMfruit' for the detection track and MMfruitSeg' for the segmentation track, in OpenImage Challenge 2019.
It is commonly known that for an object detector, the shared feature at the end of the backbone is not appropriate for both classification and regression.
We propose the Decoupling Head (DH) to disentangle the object classification and regression via the self-learned optimal feature extraction.
arXiv Detail & Related papers (2020-03-17T06:45:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.