Related papers: Rethinking the Detection Head Configuration for Traffic Object Detection

Rethinking the Detection Head Configuration for Traffic Object Detection

URL: http://arxiv.org/abs/2210.03883v1
Date: Sat, 8 Oct 2022 02:23:57 GMT
Title: Rethinking the Detection Head Configuration for Traffic Object Detection
Authors: Yi Shi, Jiang Wu, Shixuan Zhao, Gangyao Gao, Tao Deng and Hongmei Yan
Abstract summary: We propose a lightweight traffic object detection network based on matching between detection head and object distribution. The proposed model achieves more competitive performance than other models on BDD100K dataset and our proposed ETFOD-v2 dataset.
Score: 11.526701794026641
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-scale detection plays an important role in object detection models. However, researchers usually feel blank on how to reasonably configure detection heads combining multi-scale features at different input resolutions. We find that there are different matching relationships between the object distribution and the detection head at different input resolutions. Based on the instructive findings, we propose a lightweight traffic object detection network based on matching between detection head and object distribution, termed as MHD-Net. It consists of three main parts. The first is the detection head and object distribution matching strategy, which guides the rational configuration of detection head, so as to leverage multi-scale features to effectively detect objects at vastly different scales. The second is the cross-scale detection head configuration guideline, which instructs to replace multiple detection heads with only two detection heads possessing of rich feature representations to achieve an excellent balance between detection accuracy, model parameters, FLOPs and detection speed. The third is the receptive field enlargement method, which combines the dilated convolution module with shallow features of backbone to further improve the detection accuracy at the cost of increasing model parameters very slightly. The proposed model achieves more competitive performance than other models on BDD100K dataset and our proposed ETFOD-v2 dataset. The code will be available.

Related papers

TransBridge: Boost 3D Object Detection by Scene-Level Completion with Transformer Decoder [66.22997415145467]
This paper presents a joint completion and detection framework that improves the detection feature in sparse areas.<n> Specifically, we propose TransBridge, a novel transformer-based up-sampling block that fuses the features from the detection and completion networks.<n>The results show that our framework consistently improves end-to-end 3D object detection, with the mean average precision (mAP) ranging from 0.7 to 1.5 across multiple methods.
arXiv Detail & Related papers (2025-12-12T00:08:03Z)
Source-Free Object Detection with Detection Transformer [59.33653163035064]
Source-Free Object Detection (SFOD) enables knowledge transfer from a source domain to an unsupervised target domain for object detection without access to source data.<n>Most existing SFOD approaches are either confined to conventional object detection (OD) models like Faster R-CNN or designed as general solutions without tailored adaptations for novel OD architectures, especially Detection Transformer (DETR)<n>In this paper, we introduce Feature Reweighting ANd Contrastive Learning NetworK (FRANCK), a novel SFOD framework specifically designed to perform query-centric feature enhancement for DETRs.
arXiv Detail & Related papers (2025-10-13T07:35:04Z)
S-LAM3D: Segmentation-Guided Monocular 3D Object Detection via Feature Space Fusion [0.0]
Monocular 3D Object Detection represents a challenging Computer Vision task due to the nature of the input used.<n>We introduce a decoupled strategy based on injecting precomputed segmentation information priors and fusing them directly into the feature space for guiding the detection.<n>The proposed method is evaluated on the KITTI 3D Object Detection Benchmark, outperforming the equivalent architecture that relies only on RGB image features for small objects in the scene.
arXiv Detail & Related papers (2025-09-07T10:14:56Z)
Using Cross-Domain Detection Loss to Infer Multi-Scale Information for Improved Tiny Head Tracking [2.960887693377022]
We propose a framework to enhance tiny head detection and tracking by optimizing the balance between performance and efficiency.<n>Our framework integrates (1) a cross-domain detection loss, (2) a multi-scale module, and (3) a small receptive field detection mechanism.<n>These innovations enhance detection by bridging the gap between large and small detectors, capturing high-frequency details at multiple scales during training, and using filters with small receptive fields to detect tiny heads.
arXiv Detail & Related papers (2025-05-14T02:29:54Z)
Boosting 3D Object Detection with Semantic-Aware Multi-Branch Framework [44.44329455757931]
In autonomous driving, LiDAR sensors are vital for acquiring 3D point clouds, providing reliable geometric information. Traditional sampling methods of preprocessing often ignore semantic features, leading to detail loss and ground point interference. We propose a multi-branch two-stage 3D object detection framework using a Semantic-aware Multi-branch Sampling (SMS) module and multi-view constraints.
arXiv Detail & Related papers (2024-07-08T09:25:45Z)
Towards Unified 3D Object Detection via Algorithm and Data Unification [70.27631528933482]
We build the first unified multi-modal 3D object detection benchmark MM- Omni3D and extend the aforementioned monocular detector to its multi-modal version. We name the designed monocular and multi-modal detectors as UniMODE and MM-UniMODE, respectively.
arXiv Detail & Related papers (2024-02-28T18:59:31Z)
S$^3$-MonoDETR: Supervised Shape&Scale-perceptive Deformable Transformer for Monocular 3D Object Detection [21.96072831561483]
This paper proposes a novel Supervised Shape&Scale-perceptive Deformable Attention'' (S$3$-DA) module for monocular 3D object detection. Benefiting from this, S$3$-DA effectively estimates receptive fields for query points belonging to any category, enabling them to generate robust query features. Experiments on KITTI and Open datasets demonstrate that S$3$-DA significantly improves the detection accuracy.
arXiv Detail & Related papers (2023-09-02T12:36:38Z)
Multi-level and multi-modal feature fusion for accurate 3D object detection in Connected and Automated Vehicles [0.8701566919381223]
This paper presents a Deep Neural Network based 3D object detection model that leverages a three-stage feature extractor. The proposed feature extractor extracts high-level features from two input sensory modalities and recovers the important features discarded during the convolutional process. The novel fusion scheme effectively fuses features across sensory modalities and convolutional layers to find the best representative global features.
arXiv Detail & Related papers (2022-12-15T00:25:05Z)
AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation. We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z)
R(Det)^2: Randomized Decision Routing for Object Detection [64.48369663018376]
We propose a novel approach to combine decision trees and deep neural networks in an end-to-end learning manner for object detection. To facilitate effective learning, we propose randomized decision routing with node selective and associative losses. We name this approach as the randomized decision routing for object detection, abbreviated as R(Det)$2$.
arXiv Detail & Related papers (2022-04-02T07:54:58Z)
A More Compact Object Detector Head Network with Feature Enhancement and Relational Reasoning [4.171249457570931]
We propose a more compact object detector head network (CODH), which can preserve global context information and condense the information density. With our method, the parameters of the head network is 0.6 times smaller than the state-of-the-art Cascade R-CNN, yet the performance boost is 1.3% on COCO test-dev.
arXiv Detail & Related papers (2021-06-28T08:38:57Z)
M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention. The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z)
Condensing Two-stage Detection with Automatic Object Key Part Discovery [87.1034745775229]
Two-stage object detectors generally require excessively large models for their detection heads to achieve high accuracy. We propose that the model parameters of two-stage detection heads can be condensed and reduced by concentrating on object key parts. Our proposed technique consistently maintains original performance while waiving around 50% of the model parameters of common two-stage detection heads.
arXiv Detail & Related papers (2020-06-10T01:20:47Z)
FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking [92.48078680697311]
Multi-object tracking (MOT) is an important problem in computer vision. We present a simple yet effective approach termed as FairMOT based on the anchor-free object detection architecture CenterNet. The approach achieves high accuracy for both detection and tracking.
arXiv Detail & Related papers (2020-04-04T08:18:00Z)
Pixel-Semantic Revise of Position Learning A One-Stage Object Detector with A Shared Encoder-Decoder [5.371825910267909]
We analyze that different methods detect objects adaptively. Some state-of-the-art detectors combine different feature pyramids with many mechanisms to enhance multi-level semantic information. This work addresses that by an anchor-free detector with shared encoder-decoder with attention mechanism.
arXiv Detail & Related papers (2020-01-04T08:55:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.