Related papers: MonoGAE: Roadside Monocular 3D Object Detection with Ground-Aware Embeddings

MonoGAE: Roadside Monocular 3D Object Detection with Ground-Aware Embeddings

URL: http://arxiv.org/abs/2310.00400v1
Date: Sat, 30 Sep 2023 14:52:26 GMT
Title: MonoGAE: Roadside Monocular 3D Object Detection with Ground-Aware Embeddings
Authors: Lei Yang, Jiaxin Yu, Xinyu Zhang, Jun Li, Li Wang, Yi Huang, Chuang Zhang, Hong Wang, Yiming Li
Abstract summary: We introduce a novel framework for Roadside Monocular 3D object detection with ground-aware embeddings, named MonoGAE. Our approach demonstrates a substantial performance advantage over all previous monocular 3D object detectors on widely recognized 3D detection benchmarks for roadside cameras.
Score: 29.050983641961658
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although the majority of recent autonomous driving systems concentrate on developing perception methods based on ego-vehicle sensors, there is an overlooked alternative approach that involves leveraging intelligent roadside cameras to help extend the ego-vehicle perception ability beyond the visual range. We discover that most existing monocular 3D object detectors rely on the ego-vehicle prior assumption that the optical axis of the camera is parallel to the ground. However, the roadside camera is installed on a pole with a pitched angle, which makes the existing methods not optimal for roadside scenes. In this paper, we introduce a novel framework for Roadside Monocular 3D object detection with ground-aware embeddings, named MonoGAE. Specifically, the ground plane is a stable and strong prior knowledge due to the fixed installation of cameras in roadside scenarios. In order to reduce the domain gap between the ground geometry information and high-dimensional image features, we employ a supervised training paradigm with a ground plane to predict high-dimensional ground-aware embeddings. These embeddings are subsequently integrated with image features through cross-attention mechanisms. Furthermore, to improve the detector's robustness to the divergences in cameras' installation poses, we replace the ground plane depth map with a novel pixel-level refined ground plane equation map. Our approach demonstrates a substantial performance advantage over all previous monocular 3D object detectors on widely recognized 3D detection benchmarks for roadside cameras. The code and pre-trained models will be released soon.

Related papers

2.5D Object Detection for Intelligent Roadside Infrastructure [37.07785188366053]
We introduce a 2.5D object detection framework for infrastructure roadside-mounted cameras.<n>We employ a prediction approach to detect ground planes of vehicles as parallelograms in the image frame.<n>Our results show high detection accuracy, strong cross-viewpoint generalization, and robustness to diverse lighting and weather conditions.
arXiv Detail & Related papers (2025-07-04T13:16:59Z)
MOSE: Boosting Vision-based Roadside 3D Object Detection with Scene Cues [12.508548561872553]
We propose a novel framework, namely MOSE, for MOnocular 3D object detection with Scene cuEs. A scene cue bank is designed to aggregate scene cues from multiple frames of the same scene. A transformer-based decoder lifts the aggregated scene cues as well as the 3D position embeddings for 3D object location.
arXiv Detail & Related papers (2024-04-08T08:11:56Z)
LATR: 3D Lane Detection from Monocular Images with Transformer [42.34193673590758]
3D lane detection from monocular images is a fundamental yet challenging task in autonomous driving. Recent advances rely on structural 3D surrogates built from front-view image features and camera parameters. We present a novel LATR model, an end-to-end 3D lane detector that uses 3D-aware front-view features without transformed view representation.
arXiv Detail & Related papers (2023-08-08T21:08:42Z)
CAPE: Camera View Position Embedding for Multi-View 3D Object Detection [100.02565745233247]
Current query-based methods rely on global 3D position embeddings to learn the geometric correspondence between images and 3D space. We propose a novel method based on CAmera view Position Embedding, called CAPE. CAPE achieves state-of-the-art performance (61.0% NDS and 52.5% mAP) among all LiDAR-free methods on nuScenes dataset.
arXiv Detail & Related papers (2023-03-17T18:59:54Z)
BEVHeight: A Robust Framework for Vision-based Roadside 3D Object Detection [27.921256216924384]
Vision-centric bird's eye view detection methods have inferior performances on roadside cameras. We propose a simple yet effective approach, dubbed BEVHeight, to address this issue. Our method surpasses all previous vision-centric methods by a significant margin.
arXiv Detail & Related papers (2023-03-15T10:18:53Z)
Aerial Monocular 3D Object Detection [67.20369963664314]
DVDET is proposed to achieve aerial monocular 3D object detection in both the 2D image space and the 3D physical space. To address the severe view deformation issue, we propose a novel trainable geo-deformable transformation module. To encourage more researchers to investigate this area, we will release the dataset and related code.
arXiv Detail & Related papers (2022-08-08T08:32:56Z)
Satellite Image Based Cross-view Localization for Autonomous Vehicle [59.72040418584396]
This paper shows that by using an off-the-shelf high-definition satellite image as a ready-to-use map, we are able to achieve cross-view vehicle localization up to a satisfactory accuracy. Our method is validated on KITTI and Ford Multi-AV Seasonal datasets as ground view and Google Maps as the satellite view.
arXiv Detail & Related papers (2022-07-27T13:16:39Z)
PolarFormer: Multi-camera 3D Object Detection with Polar Transformers [93.49713023975727]
3D object detection in autonomous driving aims to reason "what" and "where" the objects of interest present in a 3D world. Existing methods often adopt the canonical Cartesian coordinate system with perpendicular axis. We propose a new Polar Transformer (PolarFormer) for more accurate 3D object detection in the bird's-eye-view (BEV) taking as input only multi-camera 2D images.
arXiv Detail & Related papers (2022-06-30T16:32:48Z)
SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras. Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views. In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z)
Rope3D: TheRoadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task [48.555440807415664]
We present the first high-diversity challenging Roadside Perception 3D dataset- Rope3D from a novel view. The dataset consists of 50k images and over 1.5M 3D objects in various scenes. We propose to leverage the geometry constraint to solve the inherent ambiguities caused by various sensors, viewpoints.
arXiv Detail & Related papers (2022-03-25T12:13:23Z)
Monocular 3D Vehicle Detection Using Uncalibrated Traffic Cameras through Homography [12.062095895630563]
This paper proposes a method to extract the position and pose of vehicles in the 3D world from a single traffic camera. We observe that the homography between the road plane and the image plane is essential to 3D vehicle detection. We propose a new regression target called textittailedr-box and a textitdual-view network architecture which boosts the detection accuracy on warped BEV images.
arXiv Detail & Related papers (2021-03-29T02:57:37Z)
Road Curb Detection and Localization with Monocular Forward-view Vehicle Camera [74.45649274085447]
We propose a robust method for estimating road curb 3D parameters using a calibrated monocular camera equipped with a fisheye lens. Our approach is able to estimate the vehicle to curb distance in real time with mean accuracy of more than 90%.
arXiv Detail & Related papers (2020-02-28T00:24:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.