Large receptive field strategy and important feature extraction strategy
in 3D object detection
- URL: http://arxiv.org/abs/2401.11913v2
- Date: Sun, 10 Mar 2024 10:37:21 GMT
- Title: Large receptive field strategy and important feature extraction strategy
in 3D object detection
- Authors: Leichao Cui, Xiuxian Li, Min Meng and Guangyu Jia
- Abstract summary: This study focuses on key challenges in 3D target detection.
To tackle the challenge of expanding the receptive field of a 3D convolutional kernel, we introduce the Dynamic Feature Fusion Module.
This module achieves adaptive expansion of the 3D convolutional kernel's receptive field, balancing the expansion with acceptable computational loads.
- Score: 6.3948571459793975
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The enhancement of 3D object detection is pivotal for precise environmental
perception and improved task execution capabilities in autonomous driving.
LiDAR point clouds, offering accurate depth information, serve as a crucial
information for this purpose. Our study focuses on key challenges in 3D target
detection. To tackle the challenge of expanding the receptive field of a 3D
convolutional kernel, we introduce the Dynamic Feature Fusion Module (DFFM).
This module achieves adaptive expansion of the 3D convolutional kernel's
receptive field, balancing the expansion with acceptable computational loads.
This innovation reduces operations, expands the receptive field, and allows the
model to dynamically adjust to different object requirements. Simultaneously,
we identify redundant information in 3D features. Employing the Feature
Selection Module (FSM) quantitatively evaluates and eliminates non-important
features, achieving the separation of output box fitting and feature
extraction. This innovation enables the detector to focus on critical features,
resulting in model compression, reduced computational burden, and minimized
candidate frame interference. Extensive experiments confirm that both DFFM and
FSM not only enhance current benchmarks, particularly in small target
detection, but also accelerate network performance. Importantly, these modules
exhibit effective complementarity.
Related papers
- Efficient Feature Aggregation and Scale-Aware Regression for Monocular 3D Object Detection [40.14197775884804]
MonoASRH is a novel monocular 3D detection framework composed of Efficient Hybrid Feature Aggregation Module (EH-FAM) and Adaptive Scale-Aware 3D Regression Head (ASRH)
EH-FAM employs multi-head attention with a global receptive field to extract semantic features for small-scale objects.
ASRH encodes 2D bounding box dimensions and then fuses scale features with the semantic features aggregated by EH-FAM.
arXiv Detail & Related papers (2024-11-05T02:33:25Z) - PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - Cross-Cluster Shifting for Efficient and Effective 3D Object Detection
in Autonomous Driving [69.20604395205248]
We present a new 3D point-based detector model, named Shift-SSD, for precise 3D object detection in autonomous driving.
We introduce an intriguing Cross-Cluster Shifting operation to unleash the representation capacity of the point-based detector.
We conduct extensive experiments on the KITTI, runtime, and nuScenes datasets, and the results demonstrate the state-of-the-art performance of Shift-SSD.
arXiv Detail & Related papers (2024-03-10T10:36:32Z) - AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation.
We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z) - The Devil is in the Task: Exploiting Reciprocal Appearance-Localization
Features for Monocular 3D Object Detection [62.1185839286255]
Low-cost monocular 3D object detection plays a fundamental role in autonomous driving.
We introduce a Dynamic Feature Reflecting Network, named DFR-Net.
We rank 1st among all the monocular 3D object detectors in the KITTI test set.
arXiv Detail & Related papers (2021-12-28T07:31:18Z) - Multi-View Adaptive Fusion Network for 3D Object Detection [14.506796247331584]
3D object detection based on LiDAR-camera fusion is becoming an emerging research theme for autonomous driving.
We propose a single-stage multi-view fusion framework that takes LiDAR bird's-eye view, LiDAR range view and camera view images as inputs for 3D object detection.
We design an end-to-end learnable network named MVAF-Net to integrate these two components.
arXiv Detail & Related papers (2020-11-02T00:06:01Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic
Information Modeling [65.47126868838836]
We propose a novel 3D object detection framework with dynamic information modeling.
Coarse predictions are generated in the first stage via a voxel-based region proposal network.
Experiments are conducted on the large-scale nuScenes 3D detection benchmark.
arXiv Detail & Related papers (2020-07-16T18:27:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.