A More Compact Object Detector Head Network with Feature Enhancement and
Relational Reasoning
- URL: http://arxiv.org/abs/2106.14475v1
- Date: Mon, 28 Jun 2021 08:38:57 GMT
- Title: A More Compact Object Detector Head Network with Feature Enhancement and
Relational Reasoning
- Authors: Wen chao Zhang, Chong Fu, Xiang shi Chang, Teng fei Zhao, Xiang Li,
Chiu-Wing Sham
- Abstract summary: We propose a more compact object detector head network (CODH), which can preserve global context information and condense the information density.
With our method, the parameters of the head network is 0.6 times smaller than the state-of-the-art Cascade R-CNN, yet the performance boost is 1.3% on COCO test-dev.
- Score: 4.171249457570931
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modeling implicit feature interaction patterns is of significant importance
to object detection tasks. However, in the two-stage detectors, due to the
excessive use of hand-crafted components, it is very difficult to reason about
the implicit relationship of the instance features. To tackle this problem, we
analyze three different levels of feature interaction relationships, namely,
the dependency relationship between the cropped local features and global
features, the feature autocorrelation within the instance, and the
cross-correlation relationship between the instances. To this end, we propose a
more compact object detector head network (CODH), which can not only preserve
global context information and condense the information density, but also
allows instance-wise feature enhancement and relational reasoning in a larger
matrix space. Without bells and whistles, our method can effectively improve
the detection performance while significantly reducing the parameters of the
model, e.g., with our method, the parameters of the head network is 0.6 times
smaller than the state-of-the-art Cascade R-CNN, yet the performance boost is
1.3% on COCO test-dev. Without losing generality, we can also build a more
lighter head network for other multi-stage detectors by assembling our method.
Related papers
- Renormalized Connection for Scale-preferred Object Detection in Satellite Imagery [51.83786195178233]
We design a Knowledge Discovery Network (KDN) to implement the renormalization group theory in terms of efficient feature extraction.
Renormalized connection (RC) on the KDN enables synergistic focusing'' of multi-scale features.
RCs extend the multi-level feature's divide-and-conquer'' mechanism of the FPN-based detectors to a wide range of scale-preferred tasks.
arXiv Detail & Related papers (2024-09-09T13:56:22Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection.
First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network.
Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z) - Rethinking the Detection Head Configuration for Traffic Object Detection [11.526701794026641]
We propose a lightweight traffic object detection network based on matching between detection head and object distribution.
The proposed model achieves more competitive performance than other models on BDD100K dataset and our proposed ETFOD-v2 dataset.
arXiv Detail & Related papers (2022-10-08T02:23:57Z) - Ret3D: Rethinking Object Relations for Efficient 3D Object Detection in
Driving Scenes [82.4186966781934]
We introduce a simple, efficient, and effective two-stage detector, termed as Ret3D.
At the core of Ret3D is the utilization of novel intra-frame and inter-frame relation modules.
With negligible extra overhead, Ret3D achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-08-18T03:48:58Z) - DisARM: Displacement Aware Relation Module for 3D Detection [38.4380420322491]
Displacement Aware Relation Module (DisARM) is a novel neural network module for enhancing the performance of 3D object detection in point cloud scenes.
To find the anchors, we first perform a preliminary relation anchor module with an objectness-aware sampling approach.
This lightweight relation module leads to significantly higher accuracy of object instance detection when being plugged into the state-of-the-art detectors.
arXiv Detail & Related papers (2022-03-02T14:49:55Z) - EPNet++: Cascade Bi-directional Fusion for Multi-Modal 3D Object
Detection [56.03081616213012]
We propose EPNet++ for multi-modal 3D object detection by introducing a novel Cascade Bi-directional Fusion(CB-Fusion) module.
The proposed CB-Fusion module boosts the plentiful semantic information of point features with the image features in a cascade bi-directional interaction fusion manner.
The experiment results on the KITTI, JRDB and SUN-RGBD datasets demonstrate the superiority of EPNet++ over the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-21T10:48:34Z) - Anchor Retouching via Model Interaction for Robust Object Detection in
Aerial Images [15.404024559652534]
We present an effective Dynamic Enhancement Anchor (DEA) network to construct a novel training sample generator.
Our method achieves state-of-the-art performance in accuracy with moderate inference speed and computational overhead for training.
arXiv Detail & Related papers (2021-12-13T14:37:20Z) - LC3Net: Ladder context correlation complementary network for salient
object detection [0.32116198597240836]
We propose a novel ladder context correlation complementary network (LC3Net)
FCB is a filterable convolution block to assist the automatic collection of information on the diversity of initial features.
DCM is a dense cross module to facilitate the intimate aggregation of different levels of features.
BCD is a bidirectional compression decoder to help the progressive shrinkage of multi-scale features.
arXiv Detail & Related papers (2021-10-21T03:12:32Z) - Attention-based Joint Detection of Object and Semantic Part [4.389917490809522]
Our model is created on top of two Faster-RCNN models that share their features to get enhanced representations of both.
Experiments on the PASCAL-Part 2010 dataset show that joint detection can simultaneously improve both object detection and part detection.
arXiv Detail & Related papers (2020-07-05T18:54:10Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.