Rethinking the Detection Head Configuration for Traffic Object Detection
- URL: http://arxiv.org/abs/2210.03883v1
- Date: Sat, 8 Oct 2022 02:23:57 GMT
- Title: Rethinking the Detection Head Configuration for Traffic Object Detection
- Authors: Yi Shi, Jiang Wu, Shixuan Zhao, Gangyao Gao, Tao Deng and Hongmei Yan
- Abstract summary: We propose a lightweight traffic object detection network based on matching between detection head and object distribution.
The proposed model achieves more competitive performance than other models on BDD100K dataset and our proposed ETFOD-v2 dataset.
- Score: 11.526701794026641
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-scale detection plays an important role in object detection models.
However, researchers usually feel blank on how to reasonably configure
detection heads combining multi-scale features at different input resolutions.
We find that there are different matching relationships between the object
distribution and the detection head at different input resolutions. Based on
the instructive findings, we propose a lightweight traffic object detection
network based on matching between detection head and object distribution,
termed as MHD-Net. It consists of three main parts. The first is the detection
head and object distribution matching strategy, which guides the rational
configuration of detection head, so as to leverage multi-scale features to
effectively detect objects at vastly different scales. The second is the
cross-scale detection head configuration guideline, which instructs to replace
multiple detection heads with only two detection heads possessing of rich
feature representations to achieve an excellent balance between detection
accuracy, model parameters, FLOPs and detection speed. The third is the
receptive field enlargement method, which combines the dilated convolution
module with shallow features of backbone to further improve the detection
accuracy at the cost of increasing model parameters very slightly. The proposed
model achieves more competitive performance than other models on BDD100K
dataset and our proposed ETFOD-v2 dataset. The code will be available.
Related papers
- Boosting 3D Object Detection with Semantic-Aware Multi-Branch Framework [44.44329455757931]
In autonomous driving, LiDAR sensors are vital for acquiring 3D point clouds, providing reliable geometric information.
To address this, we propose a multi-branch two-stage 3D object detection framework using a Semantic-aware Multi-branch Sampling (SMS) module.
The experimental results on KITTI 3D object detection benchmark dataset show that our method achieves excellent detection performance improvement for a variety of backbones.
arXiv Detail & Related papers (2024-07-08T09:25:45Z) - Towards Unified 3D Object Detection via Algorithm and Data Unification [70.27631528933482]
We build the first unified multi-modal 3D object detection benchmark MM- Omni3D and extend the aforementioned monocular detector to its multi-modal version.
We name the designed monocular and multi-modal detectors as UniMODE and MM-UniMODE, respectively.
arXiv Detail & Related papers (2024-02-28T18:59:31Z) - S$^3$-MonoDETR: Supervised Shape&Scale-perceptive Deformable Transformer for Monocular 3D Object Detection [21.96072831561483]
This paper proposes a novel Supervised Shape&Scale-perceptive Deformable Attention'' (S$3$-DA) module for monocular 3D object detection.
Benefiting from this, S$3$-DA effectively estimates receptive fields for query points belonging to any category, enabling them to generate robust query features.
Experiments on KITTI and Open datasets demonstrate that S$3$-DA significantly improves the detection accuracy.
arXiv Detail & Related papers (2023-09-02T12:36:38Z) - Multi-level and multi-modal feature fusion for accurate 3D object
detection in Connected and Automated Vehicles [0.8701566919381223]
This paper presents a Deep Neural Network based 3D object detection model that leverages a three-stage feature extractor.
The proposed feature extractor extracts high-level features from two input sensory modalities and recovers the important features discarded during the convolutional process.
The novel fusion scheme effectively fuses features across sensory modalities and convolutional layers to find the best representative global features.
arXiv Detail & Related papers (2022-12-15T00:25:05Z) - AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation.
We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z) - R(Det)^2: Randomized Decision Routing for Object Detection [64.48369663018376]
We propose a novel approach to combine decision trees and deep neural networks in an end-to-end learning manner for object detection.
To facilitate effective learning, we propose randomized decision routing with node selective and associative losses.
We name this approach as the randomized decision routing for object detection, abbreviated as R(Det)$2$.
arXiv Detail & Related papers (2022-04-02T07:54:58Z) - A More Compact Object Detector Head Network with Feature Enhancement and
Relational Reasoning [4.171249457570931]
We propose a more compact object detector head network (CODH), which can preserve global context information and condense the information density.
With our method, the parameters of the head network is 0.6 times smaller than the state-of-the-art Cascade R-CNN, yet the performance boost is 1.3% on COCO test-dev.
arXiv Detail & Related papers (2021-06-28T08:38:57Z) - M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z) - Condensing Two-stage Detection with Automatic Object Key Part Discovery [87.1034745775229]
Two-stage object detectors generally require excessively large models for their detection heads to achieve high accuracy.
We propose that the model parameters of two-stage detection heads can be condensed and reduced by concentrating on object key parts.
Our proposed technique consistently maintains original performance while waiving around 50% of the model parameters of common two-stage detection heads.
arXiv Detail & Related papers (2020-06-10T01:20:47Z) - FairMOT: On the Fairness of Detection and Re-Identification in Multiple
Object Tracking [92.48078680697311]
Multi-object tracking (MOT) is an important problem in computer vision.
We present a simple yet effective approach termed as FairMOT based on the anchor-free object detection architecture CenterNet.
The approach achieves high accuracy for both detection and tracking.
arXiv Detail & Related papers (2020-04-04T08:18:00Z) - Pixel-Semantic Revise of Position Learning A One-Stage Object Detector
with A Shared Encoder-Decoder [5.371825910267909]
We analyze that different methods detect objects adaptively.
Some state-of-the-art detectors combine different feature pyramids with many mechanisms to enhance multi-level semantic information.
This work addresses that by an anchor-free detector with shared encoder-decoder with attention mechanism.
arXiv Detail & Related papers (2020-01-04T08:55:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.