Collaboration Helps Camera Overtake LiDAR in 3D Detection
- URL: http://arxiv.org/abs/2303.13560v1
- Date: Thu, 23 Mar 2023 03:50:41 GMT
- Title: Collaboration Helps Camera Overtake LiDAR in 3D Detection
- Authors: Yue Hu, Yifan Lu, Runsheng Xu, Weidi Xie, Siheng Chen, Yanfeng Wang
- Abstract summary: Camera-only 3D detection provides a simple solution for localizing objects in 3D space compared to LiDAR-based detection systems.
Our proposed collaborative camera-only 3D detection (CoCa3D) enables agents to share complementary information with each other through communication.
Results show that CoCa3D improves previous SOTA performances by 44.21% on DAIR-V2X, 30.60% on OPV2V+, 12.59% on CoPerception-UAVs+ for AP@70.
- Score: 49.58433319402405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Camera-only 3D detection provides an economical solution with a simple
configuration for localizing objects in 3D space compared to LiDAR-based
detection systems. However, a major challenge lies in precise depth estimation
due to the lack of direct 3D measurements in the input. Many previous methods
attempt to improve depth estimation through network designs, e.g., deformable
layers and larger receptive fields. This work proposes an orthogonal direction,
improving the camera-only 3D detection by introducing multi-agent
collaborations. Our proposed collaborative camera-only 3D detection (CoCa3D)
enables agents to share complementary information with each other through
communication. Meanwhile, we optimize communication efficiency by selecting the
most informative cues. The shared messages from multiple viewpoints
disambiguate the single-agent estimated depth and complement the occluded and
long-range regions in the single-agent view. We evaluate CoCa3D in one
real-world dataset and two new simulation datasets. Results show that CoCa3D
improves previous SOTA performances by 44.21% on DAIR-V2X, 30.60% on OPV2V+,
12.59% on CoPerception-UAVs+ for AP@70. Our preliminary results show a
potential that with sufficient collaboration, the camera might overtake LiDAR
in some practical scenarios. We released the dataset and code at
https://siheng-chen.github.io/dataset/CoPerception+ and
https://github.com/MediaBrain-SJTU/CoCa3D.
Related papers
- Sparse Points to Dense Clouds: Enhancing 3D Detection with Limited LiDAR Data [68.18735997052265]
We propose a balanced approach that combines the advantages of monocular and point cloud-based 3D detection.
Our method requires only a small number of 3D points, that can be obtained from a low-cost, low-resolution sensor.
The accuracy of 3D detection improves by 20% compared to the state-of-the-art monocular detection methods.
arXiv Detail & Related papers (2024-04-10T03:54:53Z) - Coordinate-Aligned Multi-Camera Collaboration for Active Multi-Object
Tracking [114.16306938870055]
We propose a coordinate-aligned multi-camera collaboration system for AMOT.
In our approach, we regard each camera as an agent and address AMOT with a multi-agent reinforcement learning solution.
Our system achieves a coverage of 71.88%, outperforming the baseline method by 8.9%.
arXiv Detail & Related papers (2022-02-22T13:28:40Z) - Multi-modal 3D Human Pose Estimation with 2D Weak Supervision in
Autonomous Driving [74.74519047735916]
3D human pose estimation (HPE) in autonomous vehicles (AV) differs from other use cases in many factors.
Data collected for other use cases (such as virtual reality, gaming, and animation) may not be usable for AV applications.
We propose one of the first approaches to alleviate this problem in the AV setting.
arXiv Detail & Related papers (2021-12-22T18:57:16Z) - Is Pseudo-Lidar needed for Monocular 3D Object detection? [32.772699246216774]
We propose an end-to-end, single stage, monocular 3D object detector, DD3D, that can benefit from depth pre-training like pseudo-lidar methods, but without their limitations.
Our architecture is designed for effective information transfer between depth estimation and 3D detection, allowing us to scale with the amount of unlabeled pre-training data.
arXiv Detail & Related papers (2021-08-13T22:22:51Z) - Categorical Depth Distribution Network for Monocular 3D Object Detection [7.0405916639906785]
Key challenge in monocular 3D detection is accurately predicting object depth.
Many methods attempt to directly estimate depth to assist in 3D detection, but show limited performance as a result of depth inaccuracy.
We propose Categorical Depth Distribution Network (CaDDN) to project rich contextual feature information to the appropriate depth interval in 3D space.
We validate our approach on the KITTI 3D object detection benchmark, where we rank 1st among published monocular methods.
arXiv Detail & Related papers (2021-03-01T16:08:29Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z) - ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object
Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection.
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z) - SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint
Estimation [3.1542695050861544]
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving.
We propose a novel 3D object detection method, named SMOKE, that combines a single keypoint estimate with regressed 3D variables.
Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset.
arXiv Detail & Related papers (2020-02-24T08:15:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.