BEVHeight: A Robust Framework for Vision-based Roadside 3D Object
Detection
- URL: http://arxiv.org/abs/2303.08498v2
- Date: Tue, 11 Apr 2023 07:19:13 GMT
- Title: BEVHeight: A Robust Framework for Vision-based Roadside 3D Object
Detection
- Authors: Lei Yang, Kaicheng Yu, Tao Tang, Jun Li, Kun Yuan, Li Wang, Xinyu
Zhang, Peng Chen
- Abstract summary: Vision-centric bird's eye view detection methods have inferior performances on roadside cameras.
We propose a simple yet effective approach, dubbed BEVHeight, to address this issue.
Our method surpasses all previous vision-centric methods by a significant margin.
- Score: 27.921256216924384
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While most recent autonomous driving system focuses on developing perception
methods on ego-vehicle sensors, people tend to overlook an alternative approach
to leverage intelligent roadside cameras to extend the perception ability
beyond the visual range. We discover that the state-of-the-art vision-centric
bird's eye view detection methods have inferior performances on roadside
cameras. This is because these methods mainly focus on recovering the depth
regarding the camera center, where the depth difference between the car and the
ground quickly shrinks while the distance increases. In this paper, we propose
a simple yet effective approach, dubbed BEVHeight, to address this issue. In
essence, instead of predicting the pixel-wise depth, we regress the height to
the ground to achieve a distance-agnostic formulation to ease the optimization
process of camera-only perception methods. On popular 3D detection benchmarks
of roadside cameras, our method surpasses all previous vision-centric methods
by a significant margin. The code is available at
{\url{https://github.com/ADLab-AutoDrive/BEVHeight}}.
Related papers
- SDGE: Stereo Guided Depth Estimation for 360$^\circ$ Camera Sets [65.64958606221069]
Multi-camera systems are often used in autonomous driving to achieve a 360$circ$ perception.
These 360$circ$ camera sets often have limited or low-quality overlap regions, making multi-view stereo methods infeasible for the entire image.
We propose the Stereo Guided Depth Estimation (SGDE) method, which enhances depth estimation of the full image by explicitly utilizing multi-view stereo results on the overlap.
arXiv Detail & Related papers (2024-02-19T02:41:37Z) - SGV3D:Towards Scenario Generalization for Vision-based Roadside 3D Object Detection [27.991404725024953]
Current vision-based roadside detection methods possess high accuracy on labeled scenes but have inferior performance on new scenes.
This is because roadside cameras remain stationary after installation, resulting in the algorithm overfitting these roadside backgrounds and camera poses.
We propose an innovative Scenario Generalization Framework for Vision-based Roadside 3D Object Detection, dubbed SGV3D.
arXiv Detail & Related papers (2024-01-29T12:31:13Z) - SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction [77.15924044466976]
We propose SelfOcc to explore a self-supervised way to learn 3D occupancy using only video sequences.
We first transform the images into the 3D space (e.g., bird's eye view) to obtain 3D representation of the scene.
We can then render 2D images of previous and future frames as self-supervision signals to learn the 3D representations.
arXiv Detail & Related papers (2023-11-21T17:59:14Z) - MonoGAE: Roadside Monocular 3D Object Detection with Ground-Aware
Embeddings [29.050983641961658]
We introduce a novel framework for Roadside Monocular 3D object detection with ground-aware embeddings, named MonoGAE.
Our approach demonstrates a substantial performance advantage over all previous monocular 3D object detectors on widely recognized 3D detection benchmarks for roadside cameras.
arXiv Detail & Related papers (2023-09-30T14:52:26Z) - BEVHeight++: Toward Robust Visual Centric 3D Object Detection [32.08994153441449]
Vision-centric bird's eye view detection methods have inferior performances on roadside cameras.
We propose a simple yet effective approach, dubbed BEVHeight++, to address this issue.
By incorporating both height and depth encoding techniques, we achieve a more accurate and robust projection from 2D to BEV spaces.
arXiv Detail & Related papers (2023-09-28T05:38:32Z) - Multi-camera Bird's Eye View Perception for Autonomous Driving [17.834495597639805]
It is essential to produce perception outputs in 3D to enable the spatial reasoning of other agents and structures.
The most basic approach to achieving the desired BEV representation from a camera image is IPM, assuming a flat ground surface.
More recent approaches use deep neural networks to output directly in BEV space.
arXiv Detail & Related papers (2023-09-16T19:12:05Z) - Surround-View Vision-based 3D Detection for Autonomous Driving: A Survey [0.6091702876917281]
We provide a literature survey for the existing Vision Based 3D detection methods, focused on autonomous driving.
We have highlighted how the literature and industry trend have moved towards surround-view image based methods and note down thoughts on what special cases this method addresses.
arXiv Detail & Related papers (2023-02-13T19:30:17Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Rope3D: TheRoadside Perception Dataset for Autonomous Driving and
Monocular 3D Object Detection Task [48.555440807415664]
We present the first high-diversity challenging Roadside Perception 3D dataset- Rope3D from a novel view.
The dataset consists of 50k images and over 1.5M 3D objects in various scenes.
We propose to leverage the geometry constraint to solve the inherent ambiguities caused by various sensors, viewpoints.
arXiv Detail & Related papers (2022-03-25T12:13:23Z) - Depth Sensing Beyond LiDAR Range [84.19507822574568]
We propose a novel three-camera system that utilizes small field of view cameras.
Our system, along with our novel algorithm for computing metric depth, does not require full pre-calibration.
It can output dense depth maps with practically acceptable accuracy for scenes and objects at long distances.
arXiv Detail & Related papers (2020-04-07T00:09:51Z) - Road Curb Detection and Localization with Monocular Forward-view Vehicle
Camera [74.45649274085447]
We propose a robust method for estimating road curb 3D parameters using a calibrated monocular camera equipped with a fisheye lens.
Our approach is able to estimate the vehicle to curb distance in real time with mean accuracy of more than 90%.
arXiv Detail & Related papers (2020-02-28T00:24:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.