Related papers: Decoupled Self Attention for Accurate One Stage Object Detection

Decoupled Self Attention for Accurate One Stage Object Detection

URL: http://arxiv.org/abs/2012.07630v2
Date: Tue, 15 Dec 2020 06:47:27 GMT
Title: Decoupled Self Attention for Accurate One Stage Object Detection
Authors: Kehe WU, Zuge Chen, Qi MA, Xiaoliang Zhang, Wei Li
Abstract summary: A decoupled self attention(DSA) module is proposed for one stage object detection models in this paper. Although the network of DSA module is simple, but it can effectively improve the performance of object detection, also it can be easily embedded in many detection models.
Score: 4.791635488070342
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As the scale of object detection dataset is smaller than that of image recognition dataset ImageNet, transfer learning has become a basic training method for deep learning object detection models, which will pretrain the backbone network of object detection model on ImageNet dataset to extract features for classification and localization subtasks. However, the classification task focuses on the salient region features of object, while the location task focuses on the edge features of object, so there is certain deviation between the features extracted by pretrained backbone network and the features used for localization task. In order to solve this problem, a decoupled self attention(DSA) module is proposed for one stage object detection models in this paper. DSA includes two decoupled self-attention branches, so it can extract appropriate features for different tasks. It is located between FPN and head networks of subtasks, so it is used to extract global features based on FPN fused features for different tasks independently. Although the network of DSA module is simple, but it can effectively improve the performance of object detection, also it can be easily embedded in many detection models. Our experiments are based on the representative one-stage detection model RetinaNet. In COCO dataset, when ResNet50 and ResNet101 are used as backbone networks, the detection performances can be increased by 0.4% AP and 0.5% AP respectively. When DSA module and object confidence task are applied in RetinaNet together, the detection performances based on ResNet50 and ResNet101 can be increased by 1.0% AP and 1.4% AP respectively. The experiment results show the effectiveness of DSA module. Code is at: https://github.com/chenzuge1/DSANet.git.

Related papers

Scale-Invariant Object Detection by Adaptive Convolution with Unified Global-Local Context [3.061662434597098]
We propose an object detection model using a Switchable (adaptive) Atrous Convolutional Network (SAC-Net) based on the efficientDet model. The proposed SAC-Net encapsulates the benefits of both low-level and high-level features to achieve improved performance on multi-scale object detection tasks. Our experiments on benchmark datasets demonstrate that the proposed SAC-Net outperforms the state-of-the-art models by a significant margin in terms of accuracy.
arXiv Detail & Related papers (2024-09-17T10:08:37Z)
Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head. The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement. This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z)
Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner. We design a semantic-guided self-supervised learning model to extract high-level semantic features from images. We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z)
The Devil is in the Task: Exploiting Reciprocal Appearance-Localization Features for Monocular 3D Object Detection [62.1185839286255]
Low-cost monocular 3D object detection plays a fundamental role in autonomous driving. We introduce a Dynamic Feature Reflecting Network, named DFR-Net. We rank 1st among all the monocular 3D object detectors in the KITTI test set.
arXiv Detail & Related papers (2021-12-28T07:31:18Z)
Anchor Retouching via Model Interaction for Robust Object Detection in Aerial Images [15.404024559652534]
We present an effective Dynamic Enhancement Anchor (DEA) network to construct a novel training sample generator. Our method achieves state-of-the-art performance in accuracy with moderate inference speed and computational overhead for training.
arXiv Detail & Related papers (2021-12-13T14:37:20Z)
Multi-patch Feature Pyramid Network for Weakly Supervised Object Detection in Optical Remote Sensing Images [39.25541709228373]
We propose a new architecture for object detection with a multiple patch feature pyramid network (MPFP-Net) MPFP-Net is different from the current models that during training only pursue the most discriminative patches. We introduce an effective method to regularize the residual values and make the fusion transition layers strictly norm-preserving.
arXiv Detail & Related papers (2021-08-18T09:25:39Z)
FCCDN: Feature Constraint Network for VHR Image Change Detection [12.670734830806591]
We propose a feature constraint change detection network (FCCDN) for change detection. We constrain features both on bi-temporal feature extraction and feature fusion. We achieve state-of-the-art performance on two building change detection datasets.
arXiv Detail & Related papers (2021-05-23T06:13:47Z)
Location-Sensitive Visual Recognition with Cross-IOU Loss [177.86369890708457]
This paper proposes a unified solution named location-sensitive network (LSNet) for object detection, instance segmentation, and pose estimation. Based on a deep neural network as the backbone, LSNet predicts an anchor point and a set of landmarks which together define the shape of the target object.
arXiv Detail & Related papers (2021-04-11T02:17:14Z)
Unsupervised Pretraining for Object Detection by Patch Reidentification [72.75287435882798]
Unsupervised representation learning achieves promising performances in pre-training representations for object detectors. This work proposes a simple yet effective representation learning method for object detection, named patch re-identification (Re-ID) Our method significantly outperforms its counterparts on COCO in all settings, such as different training iterations and data percentages.
arXiv Detail & Related papers (2021-03-08T15:13:59Z)
Few-shot Object Detection on Remote Sensing Images [11.40135025181393]
We introduce a few-shot learning-based method for object detection on remote sensing images. We build our few-shot object detection model upon YOLOv3 architecture and develop a multi-scale object detection framework.
arXiv Detail & Related papers (2020-06-14T07:18:10Z)
One-Shot Object Detection without Fine-Tuning [62.39210447209698]
We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module. We also propose novel training strategies that effectively improve detection performance. Our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.
arXiv Detail & Related papers (2020-05-08T01:59:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.