MonoGAE: Roadside Monocular 3D Object Detection with Ground-Aware
Embeddings
- URL: http://arxiv.org/abs/2310.00400v1
- Date: Sat, 30 Sep 2023 14:52:26 GMT
- Title: MonoGAE: Roadside Monocular 3D Object Detection with Ground-Aware
Embeddings
- Authors: Lei Yang, Jiaxin Yu, Xinyu Zhang, Jun Li, Li Wang, Yi Huang, Chuang
Zhang, Hong Wang, Yiming Li
- Abstract summary: We introduce a novel framework for Roadside Monocular 3D object detection with ground-aware embeddings, named MonoGAE.
Our approach demonstrates a substantial performance advantage over all previous monocular 3D object detectors on widely recognized 3D detection benchmarks for roadside cameras.
- Score: 29.050983641961658
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although the majority of recent autonomous driving systems concentrate on
developing perception methods based on ego-vehicle sensors, there is an
overlooked alternative approach that involves leveraging intelligent roadside
cameras to help extend the ego-vehicle perception ability beyond the visual
range. We discover that most existing monocular 3D object detectors rely on the
ego-vehicle prior assumption that the optical axis of the camera is parallel to
the ground. However, the roadside camera is installed on a pole with a pitched
angle, which makes the existing methods not optimal for roadside scenes. In
this paper, we introduce a novel framework for Roadside Monocular 3D object
detection with ground-aware embeddings, named MonoGAE. Specifically, the ground
plane is a stable and strong prior knowledge due to the fixed installation of
cameras in roadside scenarios. In order to reduce the domain gap between the
ground geometry information and high-dimensional image features, we employ a
supervised training paradigm with a ground plane to predict high-dimensional
ground-aware embeddings. These embeddings are subsequently integrated with
image features through cross-attention mechanisms. Furthermore, to improve the
detector's robustness to the divergences in cameras' installation poses, we
replace the ground plane depth map with a novel pixel-level refined ground
plane equation map. Our approach demonstrates a substantial performance
advantage over all previous monocular 3D object detectors on widely recognized
3D detection benchmarks for roadside cameras. The code and pre-trained models
will be released soon.
Related papers
- MOSE: Boosting Vision-based Roadside 3D Object Detection with Scene Cues [12.508548561872553]
We propose a novel framework, namely MOSE, for MOnocular 3D object detection with Scene cuEs.
A scene cue bank is designed to aggregate scene cues from multiple frames of the same scene.
A transformer-based decoder lifts the aggregated scene cues as well as the 3D position embeddings for 3D object location.
arXiv Detail & Related papers (2024-04-08T08:11:56Z) - LATR: 3D Lane Detection from Monocular Images with Transformer [42.34193673590758]
3D lane detection from monocular images is a fundamental yet challenging task in autonomous driving.
Recent advances rely on structural 3D surrogates built from front-view image features and camera parameters.
We present a novel LATR model, an end-to-end 3D lane detector that uses 3D-aware front-view features without transformed view representation.
arXiv Detail & Related papers (2023-08-08T21:08:42Z) - CAPE: Camera View Position Embedding for Multi-View 3D Object Detection [100.02565745233247]
Current query-based methods rely on global 3D position embeddings to learn the geometric correspondence between images and 3D space.
We propose a novel method based on CAmera view Position Embedding, called CAPE.
CAPE achieves state-of-the-art performance (61.0% NDS and 52.5% mAP) among all LiDAR-free methods on nuScenes dataset.
arXiv Detail & Related papers (2023-03-17T18:59:54Z) - BEVHeight: A Robust Framework for Vision-based Roadside 3D Object
Detection [27.921256216924384]
Vision-centric bird's eye view detection methods have inferior performances on roadside cameras.
We propose a simple yet effective approach, dubbed BEVHeight, to address this issue.
Our method surpasses all previous vision-centric methods by a significant margin.
arXiv Detail & Related papers (2023-03-15T10:18:53Z) - Satellite Image Based Cross-view Localization for Autonomous Vehicle [59.72040418584396]
This paper shows that by using an off-the-shelf high-definition satellite image as a ready-to-use map, we are able to achieve cross-view vehicle localization up to a satisfactory accuracy.
Our method is validated on KITTI and Ford Multi-AV Seasonal datasets as ground view and Google Maps as the satellite view.
arXiv Detail & Related papers (2022-07-27T13:16:39Z) - PolarFormer: Multi-camera 3D Object Detection with Polar Transformers [93.49713023975727]
3D object detection in autonomous driving aims to reason "what" and "where" the objects of interest present in a 3D world.
Existing methods often adopt the canonical Cartesian coordinate system with perpendicular axis.
We propose a new Polar Transformer (PolarFormer) for more accurate 3D object detection in the bird's-eye-view (BEV) taking as input only multi-camera 2D images.
arXiv Detail & Related papers (2022-06-30T16:32:48Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Rope3D: TheRoadside Perception Dataset for Autonomous Driving and
Monocular 3D Object Detection Task [48.555440807415664]
We present the first high-diversity challenging Roadside Perception 3D dataset- Rope3D from a novel view.
The dataset consists of 50k images and over 1.5M 3D objects in various scenes.
We propose to leverage the geometry constraint to solve the inherent ambiguities caused by various sensors, viewpoints.
arXiv Detail & Related papers (2022-03-25T12:13:23Z) - Monocular 3D Vehicle Detection Using Uncalibrated Traffic Cameras
through Homography [12.062095895630563]
This paper proposes a method to extract the position and pose of vehicles in the 3D world from a single traffic camera.
We observe that the homography between the road plane and the image plane is essential to 3D vehicle detection.
We propose a new regression target called textittailedr-box and a textitdual-view network architecture which boosts the detection accuracy on warped BEV images.
arXiv Detail & Related papers (2021-03-29T02:57:37Z) - Road Curb Detection and Localization with Monocular Forward-view Vehicle
Camera [74.45649274085447]
We propose a robust method for estimating road curb 3D parameters using a calibrated monocular camera equipped with a fisheye lens.
Our approach is able to estimate the vehicle to curb distance in real time with mean accuracy of more than 90%.
arXiv Detail & Related papers (2020-02-28T00:24:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.