AdaptiveShape: Solving Shape Variability for 3D Object Detection with
Geometry Aware Anchor Distributions
- URL: http://arxiv.org/abs/2302.14522v1
- Date: Tue, 28 Feb 2023 12:31:31 GMT
- Title: AdaptiveShape: Solving Shape Variability for 3D Object Detection with
Geometry Aware Anchor Distributions
- Authors: Benjamin Sick, Michael Walter, Jochen Abhau
- Abstract summary: 3D object detection with point clouds and images plays an important role in perception tasks such as autonomous driving.
Current methods show great performance on detection and pose estimation of standard-shaped vehicles but lack behind on more complex shapes.
This work introduces several new methods to improve and measure the performance for such classes.
- Score: 1.3807918535446089
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D object detection with point clouds and images plays an important role in
perception tasks such as autonomous driving. Current methods show great
performance on detection and pose estimation of standard-shaped vehicles but
lack behind on more complex shapes as e.g. semi-trailer truck combinations.
Determining the shape and motion of those special vehicles accurately is
crucial in yard operation and maneuvering and industrial automation
applications. This work introduces several new methods to improve and measure
the performance for such classes. State-of-the-art methods are based on
predefined anchor grids or heatmaps for ground truth targets. However, the
underlying representations do not take the shape of different sized objects
into account. Our main contribution, AdaptiveShape, uses shape aware anchor
distributions and heatmaps to improve the detection capabilities. For large
vehicles we achieve +10.9% AP in comparison to current shape agnostic methods.
Furthermore we introduce a new fast LiDAR-camera fusion. It is based on 2D
bounding box camera detections which are available in many processing
pipelines. This fusion method does not rely on perfectly calibrated or
temporally synchronized systems and is therefore applicable to a broad range of
robotic applications. We extend a standard point pillar network to account for
temporal data and improve learning of complex object movements. In addition we
extended a ground truth augmentation to use grouped object pairs to further
improve truck AP by +2.2% compared to conventional augmentation.
Related papers
- An Efficient Wide-Range Pseudo-3D Vehicle Detection Using A Single
Camera [10.573423265001706]
This paper proposes a novel wide-range Pseudo-3D Vehicle Detection method based on images from a single camera.
To detect pseudo-3D objects, our model adopts specifically designed detection heads.
Joint constraint loss combining both the object box and SPL is designed during model training, improving the efficiency, stability, and prediction accuracy of the model.
arXiv Detail & Related papers (2023-09-15T12:50:09Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - AutoAlignV2: Deformable Feature Aggregation for Dynamic Multi-Modal 3D
Object Detection [17.526914782562528]
We propose AutoAlignV2, a faster and stronger multi-modal 3D detection framework, built on top of AutoAlign.
Our best model reaches 72.4 NDS on nuScenes test leaderboard, achieving new state-of-the-art results.
arXiv Detail & Related papers (2022-07-21T06:17:23Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - PillarGrid: Deep Learning-based Cooperative Perception for 3D Object
Detection from Onboard-Roadside LiDAR [15.195933965761645]
We propose textitPillarGrid, a novel cooperative perception method fusing information from multiple 3D LiDARs.
PillarGrid consists of four main phases: 1) cooperative preprocessing of point clouds, 2) pillar-wise voxelization and feature extraction, 3) grid-wise deep fusion of features from multiple sensors, and 4) convolutional neural network (CNN)-based augmented 3D object detection.
Extensive experimentation shows that PillarGrid outperforms the SOTA single-LiDAR-based 3D object detection methods with respect to both accuracy and range by a large margin.
arXiv Detail & Related papers (2022-03-12T02:28:41Z) - CFTrack: Center-based Radar and Camera Fusion for 3D Multi-Object
Tracking [9.62721286522053]
We propose an end-to-end network for joint object detection and tracking based on radar and camera sensor fusion.
Our proposed method uses a center-based radar-camera fusion algorithm for object detection and utilizes a greedy algorithm for object association.
We evaluate our method on the challenging nuScenes dataset, where it achieves 20.0 AMOTA and outperforms all vision-based 3D tracking methods in the benchmark.
arXiv Detail & Related papers (2021-07-11T23:56:53Z) - High-level camera-LiDAR fusion for 3D object detection with machine
learning [0.0]
This paper tackles the 3D object detection problem, which is of vital importance for applications such as autonomous driving.
It uses a Machine Learning pipeline on a combination of monocular camera and LiDAR data to detect vehicles in the surrounding 3D space of a moving platform.
Our results demonstrate an efficient and accurate inference on a validation set, achieving an overall accuracy of 87.1%.
arXiv Detail & Related papers (2021-05-24T01:57:34Z) - Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention.
Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z) - PerMO: Perceiving More at Once from a Single Image for Autonomous
Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image.
Our approach combines the strengths of deep learning and the elegance of traditional techniques.
We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z) - Road Curb Detection and Localization with Monocular Forward-view Vehicle
Camera [74.45649274085447]
We propose a robust method for estimating road curb 3D parameters using a calibrated monocular camera equipped with a fisheye lens.
Our approach is able to estimate the vehicle to curb distance in real time with mean accuracy of more than 90%.
arXiv Detail & Related papers (2020-02-28T00:24:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.