BEVHeight++: Toward Robust Visual Centric 3D Object Detection
- URL: http://arxiv.org/abs/2309.16179v1
- Date: Thu, 28 Sep 2023 05:38:32 GMT
- Title: BEVHeight++: Toward Robust Visual Centric 3D Object Detection
- Authors: Lei Yang, Tao Tang, Jun Li, Peng Chen, Kun Yuan, Li Wang, Yi Huang,
Xinyu Zhang, Kaicheng Yu
- Abstract summary: Vision-centric bird's eye view detection methods have inferior performances on roadside cameras.
We propose a simple yet effective approach, dubbed BEVHeight++, to address this issue.
By incorporating both height and depth encoding techniques, we achieve a more accurate and robust projection from 2D to BEV spaces.
- Score: 32.08994153441449
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While most recent autonomous driving system focuses on developing perception
methods on ego-vehicle sensors, people tend to overlook an alternative approach
to leverage intelligent roadside cameras to extend the perception ability
beyond the visual range. We discover that the state-of-the-art vision-centric
bird's eye view detection methods have inferior performances on roadside
cameras. This is because these methods mainly focus on recovering the depth
regarding the camera center, where the depth difference between the car and the
ground quickly shrinks while the distance increases. In this paper, we propose
a simple yet effective approach, dubbed BEVHeight++, to address this issue. In
essence, we regress the height to the ground to achieve a distance-agnostic
formulation to ease the optimization process of camera-only perception methods.
By incorporating both height and depth encoding techniques, we achieve a more
accurate and robust projection from 2D to BEV spaces. On popular 3D detection
benchmarks of roadside cameras, our method surpasses all previous
vision-centric methods by a significant margin. In terms of the ego-vehicle
scenario, our BEVHeight++ possesses superior over depth-only methods.
Specifically, it yields a notable improvement of +1.9% NDS and +1.1% mAP over
BEVDepth when evaluated on the nuScenes validation set. Moreover, on the
nuScenes test set, our method achieves substantial advancements, with an
increase of +2.8% NDS and +1.7% mAP, respectively.
Related papers
- SDGE: Stereo Guided Depth Estimation for 360$^\circ$ Camera Sets [65.64958606221069]
Multi-camera systems are often used in autonomous driving to achieve a 360$circ$ perception.
These 360$circ$ camera sets often have limited or low-quality overlap regions, making multi-view stereo methods infeasible for the entire image.
We propose the Stereo Guided Depth Estimation (SGDE) method, which enhances depth estimation of the full image by explicitly utilizing multi-view stereo results on the overlap.
arXiv Detail & Related papers (2024-02-19T02:41:37Z) - SGV3D:Towards Scenario Generalization for Vision-based Roadside 3D Object Detection [27.991404725024953]
Current vision-based roadside detection methods possess high accuracy on labeled scenes but have inferior performance on new scenes.
This is because roadside cameras remain stationary after installation, resulting in the algorithm overfitting these roadside backgrounds and camera poses.
We propose an innovative Scenario Generalization Framework for Vision-based Roadside 3D Object Detection, dubbed SGV3D.
arXiv Detail & Related papers (2024-01-29T12:31:13Z) - Instance-aware Multi-Camera 3D Object Detection with Structural Priors
Mining and Self-Boosting Learning [93.71280187657831]
Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field.
We propose IA-BEV, which integrates image-plane instance awareness into the depth estimation process within a BEV-based detector.
arXiv Detail & Related papers (2023-12-13T09:24:42Z) - CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity [34.025530326420146]
We develop Complementary-BEV, a novel end-to-end monocular 3D object detection framework.
We conduct extensive experiments on the public 3D detection benchmarks of roadside camera-based DAIR-V2X-I and Rope3D.
For the first time, the vehicle AP score of a camera model reaches 80% on DAIR-V2X-I in terms of easy mode.
arXiv Detail & Related papers (2023-10-04T13:38:53Z) - MonoGAE: Roadside Monocular 3D Object Detection with Ground-Aware
Embeddings [29.050983641961658]
We introduce a novel framework for Roadside Monocular 3D object detection with ground-aware embeddings, named MonoGAE.
Our approach demonstrates a substantial performance advantage over all previous monocular 3D object detectors on widely recognized 3D detection benchmarks for roadside cameras.
arXiv Detail & Related papers (2023-09-30T14:52:26Z) - Multi-camera Bird's Eye View Perception for Autonomous Driving [17.834495597639805]
It is essential to produce perception outputs in 3D to enable the spatial reasoning of other agents and structures.
The most basic approach to achieving the desired BEV representation from a camera image is IPM, assuming a flat ground surface.
More recent approaches use deep neural networks to output directly in BEV space.
arXiv Detail & Related papers (2023-09-16T19:12:05Z) - BEVHeight: A Robust Framework for Vision-based Roadside 3D Object
Detection [27.921256216924384]
Vision-centric bird's eye view detection methods have inferior performances on roadside cameras.
We propose a simple yet effective approach, dubbed BEVHeight, to address this issue.
Our method surpasses all previous vision-centric methods by a significant margin.
arXiv Detail & Related papers (2023-03-15T10:18:53Z) - BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation [105.96557764248846]
We introduce BEVFusion, a generic multi-task multi-sensor fusion framework.
It unifies multi-modal features in the shared bird's-eye view representation space.
It achieves 1.3% higher mAP and NDS on 3D object detection and 13.6% higher mIoU on BEV map segmentation, with 1.9x lower cost.
arXiv Detail & Related papers (2022-05-26T17:59:35Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Rope3D: TheRoadside Perception Dataset for Autonomous Driving and
Monocular 3D Object Detection Task [48.555440807415664]
We present the first high-diversity challenging Roadside Perception 3D dataset- Rope3D from a novel view.
The dataset consists of 50k images and over 1.5M 3D objects in various scenes.
We propose to leverage the geometry constraint to solve the inherent ambiguities caused by various sensors, viewpoints.
arXiv Detail & Related papers (2022-03-25T12:13:23Z) - Defocus Blur Detection via Depth Distillation [64.78779830554731]
We introduce depth information into DBD for the first time.
In detail, we learn the defocus blur from ground truth and the depth distilled from a well-trained depth estimation network.
Our approach outperforms 11 other state-of-the-art methods on two popular datasets.
arXiv Detail & Related papers (2020-07-16T04:58:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.