Aerial Monocular 3D Object Detection
- URL: http://arxiv.org/abs/2208.03974v1
- Date: Mon, 8 Aug 2022 08:32:56 GMT
- Title: Aerial Monocular 3D Object Detection
- Authors: Yue Hu, Shaoheng Fang, Weidi Xie and Siheng Chen
- Abstract summary: This work proposes a dual-view detection system named DVDET to achieve aerial monocular object detection in both the 2D image space and the 3D physical space.
To address the dataset challenge, we propose a new large-scale simulation dataset named AM3D-Sim, generated by the co-simulation of AirSIM and CARLA, and a new real-world aerial dataset named AM3D-Real, collected by DJI Matrice 300 RTK.
- Score: 46.26215100532241
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Drones equipped with cameras can significantly enhance human ability to
perceive the world because of their remarkable maneuverability in 3D space.
Ironically, object detection for drones has always been conducted in the 2D
image space, which fundamentally limits their ability to understand 3D scenes.
Furthermore, existing 3D object detection methods developed for autonomous
driving cannot be directly applied to drones due to the lack of deformation
modeling, which is essential for the distant aerial perspective with sensitive
distortion and small objects. To fill the gap, this work proposes a dual-view
detection system named DVDET to achieve aerial monocular object detection in
both the 2D image space and the 3D physical space. To address the severe view
deformation issue, we propose a novel trainable geo-deformable transformation
module that can properly warp information from the drone's perspective to the
BEV. Compared to the monocular methods for cars, our transformation includes a
learnable deformable network for explicitly revising the severe deviation. To
address the dataset challenge, we propose a new large-scale simulation dataset
named AM3D-Sim, generated by the co-simulation of AirSIM and CARLA, and a new
real-world aerial dataset named AM3D-Real, collected by DJI Matrice 300 RTK, in
both datasets, high-quality annotations for 3D object detection are provided.
Extensive experiments show that i) aerial monocular 3D object detection is
feasible; ii) the model pre-trained on the simulation dataset benefits
real-world performance, and iii) DVDET also benefits monocular 3D object
detection for cars. To encourage more researchers to investigate this area, we
will release the dataset and related code in
https://sjtu-magic.github.io/dataset/AM3D/.
Related papers
- HeightFormer: A Semantic Alignment Monocular 3D Object Detection Method from Roadside Perspective [11.841338298700421]
We propose a novel 3D object detection framework integrating Spatial Former and Voxel Pooling Former to enhance 2D-to-3D projection based on height estimation.
Experiments were conducted using the Rope3D and DAIR-V2X-I dataset, and the results demonstrated the outperformance of the proposed algorithm in the detection of both vehicles and cyclists.
arXiv Detail & Related papers (2024-10-10T09:37:33Z) - VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection [80.62052650370416]
monocular 3D object detection holds significant importance across various applications, including autonomous driving and robotics.
In this paper, we present VFMM3D, an innovative framework that leverages the capabilities of Vision Foundation Models (VFMs) to accurately transform single-view images into LiDAR point cloud representations.
arXiv Detail & Related papers (2024-04-15T03:12:12Z) - 3D Data Augmentation for Driving Scenes on Camera [50.41413053812315]
We propose a 3D data augmentation approach termed Drive-3DAug, aiming at augmenting the driving scenes on camera in the 3D space.
We first utilize Neural Radiance Field (NeRF) to reconstruct the 3D models of background and foreground objects.
Then, augmented driving scenes can be obtained by placing the 3D objects with adapted location and orientation at the pre-defined valid region of backgrounds.
arXiv Detail & Related papers (2023-03-18T05:51:05Z) - MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices [78.20154723650333]
High-quality 3D ground-truth shapes are critical for 3D object reconstruction evaluation.
We introduce a novel multi-view RGBD dataset captured using a mobile device.
We obtain precise 3D ground-truth shape without relying on high-end 3D scanners.
arXiv Detail & Related papers (2023-03-03T14:02:50Z) - PC-DAN: Point Cloud based Deep Affinity Network for 3D Multi-Object
Tracking (Accepted as an extended abstract in JRDB-ACT Workshop at CVPR21) [68.12101204123422]
A point cloud is a dense compilation of spatial data in 3D coordinates.
We propose a PointNet-based approach for 3D Multi-Object Tracking (MOT)
arXiv Detail & Related papers (2021-06-03T05:36:39Z) - Kinematic 3D Object Detection in Monocular Video [123.7119180923524]
We propose a novel method for monocular video-based 3D object detection which carefully leverages kinematic motion to improve precision of 3D localization.
We achieve state-of-the-art performance on monocular 3D object detection and the Bird's Eye View tasks within the KITTI self-driving dataset.
arXiv Detail & Related papers (2020-07-19T01:15:12Z) - SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint
Estimation [3.1542695050861544]
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving.
We propose a novel 3D object detection method, named SMOKE, that combines a single keypoint estimate with regressed 3D variables.
Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset.
arXiv Detail & Related papers (2020-02-24T08:15:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.