2.5D Object Detection for Intelligent Roadside Infrastructure
- URL: http://arxiv.org/abs/2507.03564v2
- Date: Wed, 16 Jul 2025 12:36:59 GMT
- Title: 2.5D Object Detection for Intelligent Roadside Infrastructure
- Authors: Nikolai Polley, Yacin Boualili, Ferdinand Mütsch, Maximilian Zipfl, Tobias Fleck, J. Marius Zöllner,
- Abstract summary: We introduce a 2.5D object detection framework for infrastructure roadside-mounted cameras.<n>We employ a prediction approach to detect ground planes of vehicles as parallelograms in the image frame.<n>Our results show high detection accuracy, strong cross-viewpoint generalization, and robustness to diverse lighting and weather conditions.
- Score: 37.07785188366053
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: On-board sensors of autonomous vehicles can be obstructed, occluded, or limited by restricted fields of view, complicating downstream driving decisions. Intelligent roadside infrastructure perception systems, installed at elevated vantage points, can provide wide, unobstructed intersection coverage, supplying a complementary information stream to autonomous vehicles via vehicle-to-everything (V2X) communication. However, conventional 3D object-detection algorithms struggle to generalize under the domain shift introduced by top-down perspectives and steep camera angles. We introduce a 2.5D object detection framework, tailored specifically for infrastructure roadside-mounted cameras. Unlike conventional 2D or 3D object detection, we employ a prediction approach to detect ground planes of vehicles as parallelograms in the image frame. The parallelogram preserves the planar position, size, and orientation of objects while omitting their height, which is unnecessary for most downstream applications. For training, a mix of real-world and synthetically generated scenes is leveraged. We evaluate generalizability on a held-out camera viewpoint and in adverse-weather scenarios absent from the training set. Our results show high detection accuracy, strong cross-viewpoint generalization, and robustness to diverse lighting and weather conditions. Model weights and inference code are provided at: https://gitlab.kit.edu/kit/aifb/ATKS/public/digit4taf/2.5d-object-detection
Related papers
- Vision-based Lifting of 2D Object Detections for Automated Driving [8.321333802704446]
We propose a pipeline which lifts the results of existing vision-based 2D algorithms to 3D detections using only cameras.<n>To the best of our knowledge, we are the first using a 2D CNN to process the point cloud for each 2D detection to keep the computational effort as low as possible.
arXiv Detail & Related papers (2025-06-13T14:40:12Z) - DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation [49.32104127246474]
DriveGEN is a training-free controllable Text-to-Image Diffusion Generation.<n>It consistently preserves objects with precise 3D geometry across diverse Out-of-Distribution generations.
arXiv Detail & Related papers (2025-03-14T06:35:38Z) - HeightFormer: A Semantic Alignment Monocular 3D Object Detection Method from Roadside Perspective [11.841338298700421]
We propose a novel 3D object detection framework integrating Spatial Former and Voxel Pooling Former to enhance 2D-to-3D projection based on height estimation.
Experiments were conducted using the Rope3D and DAIR-V2X-I dataset, and the results demonstrated the outperformance of the proposed algorithm in the detection of both vehicles and cyclists.
arXiv Detail & Related papers (2024-10-10T09:37:33Z) - MonoGAE: Roadside Monocular 3D Object Detection with Ground-Aware
Embeddings [29.050983641961658]
We introduce a novel framework for Roadside Monocular 3D object detection with ground-aware embeddings, named MonoGAE.
Our approach demonstrates a substantial performance advantage over all previous monocular 3D object detectors on widely recognized 3D detection benchmarks for roadside cameras.
arXiv Detail & Related papers (2023-09-30T14:52:26Z) - AdaptiveShape: Solving Shape Variability for 3D Object Detection with
Geometry Aware Anchor Distributions [1.3807918535446089]
3D object detection with point clouds and images plays an important role in perception tasks such as autonomous driving.
Current methods show great performance on detection and pose estimation of standard-shaped vehicles but lack behind on more complex shapes.
This work introduces several new methods to improve and measure the performance for such classes.
arXiv Detail & Related papers (2023-02-28T12:31:31Z) - Aerial Monocular 3D Object Detection [67.20369963664314]
DVDET is proposed to achieve aerial monocular 3D object detection in both the 2D image space and the 3D physical space.<n>To address the severe view deformation issue, we propose a novel trainable geo-deformable transformation module.<n>To encourage more researchers to investigate this area, we will release the dataset and related code.
arXiv Detail & Related papers (2022-08-08T08:32:56Z) - Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z) - Rope3D: TheRoadside Perception Dataset for Autonomous Driving and
Monocular 3D Object Detection Task [48.555440807415664]
We present the first high-diversity challenging Roadside Perception 3D dataset- Rope3D from a novel view.
The dataset consists of 50k images and over 1.5M 3D objects in various scenes.
We propose to leverage the geometry constraint to solve the inherent ambiguities caused by various sensors, viewpoints.
arXiv Detail & Related papers (2022-03-25T12:13:23Z) - Weakly Supervised Training of Monocular 3D Object Detectors Using Wide
Baseline Multi-view Traffic Camera Data [19.63193201107591]
7DoF prediction of vehicles at an intersection is an important task for assessing potential conflicts between road users.
We develop an approach using a weakly supervised method of fine tuning 3D object detectors for traffic observation cameras.
Our method achieves vehicle 7DoF pose prediction accuracy on our dataset comparable to the top performing monocular 3D object detectors on autonomous vehicle datasets.
arXiv Detail & Related papers (2021-10-21T08:26:48Z) - Train in Germany, Test in The USA: Making 3D Object Detectors Generalize [59.455225176042404]
deep learning has substantially improved the 3D object detection accuracy for LiDAR and stereo camera data alike.
Most datasets for autonomous driving are collected within a narrow subset of cities within one country.
In this paper we consider the task of adapting 3D object detectors from one dataset to another.
arXiv Detail & Related papers (2020-05-17T00:56:18Z) - Road Curb Detection and Localization with Monocular Forward-view Vehicle
Camera [74.45649274085447]
We propose a robust method for estimating road curb 3D parameters using a calibrated monocular camera equipped with a fisheye lens.
Our approach is able to estimate the vehicle to curb distance in real time with mean accuracy of more than 90%.
arXiv Detail & Related papers (2020-02-28T00:24:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.