Real-time Full-stack Traffic Scene Perception for Autonomous Driving
with Roadside Cameras
- URL: http://arxiv.org/abs/2206.09770v1
- Date: Mon, 20 Jun 2022 13:33:52 GMT
- Title: Real-time Full-stack Traffic Scene Perception for Autonomous Driving
with Roadside Cameras
- Authors: Zhengxia Zou, Rusheng Zhang, Shengyin Shen, Gaurav Pandey, Punarjay
Chakravarty, Armin Parchami, Henry X. Liu
- Abstract summary: We propose a novel framework for traffic scene perception with roadside cameras.
The proposed framework covers a full-stack of roadside perception, including object detection, object localization, object tracking, and multi-camera information fusion.
Our framework is deployed at a two-lane roundabout located at Ellsworth Rd. and State St., Ann Arbor, MI, USA, providing 7x24 real-time traffic flow monitoring and high-precision vehicle trajectory extraction.
- Score: 20.527834125706526
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel and pragmatic framework for traffic scene perception with
roadside cameras. The proposed framework covers a full-stack of roadside
perception pipeline for infrastructure-assisted autonomous driving, including
object detection, object localization, object tracking, and multi-camera
information fusion. Unlike previous vision-based perception frameworks rely
upon depth offset or 3D annotation at training, we adopt a modular decoupling
design and introduce a landmark-based 3D localization method, where the
detection and localization can be well decoupled so that the model can be
easily trained based on only 2D annotations. The proposed framework applies to
either optical or thermal cameras with pinhole or fish-eye lenses. Our
framework is deployed at a two-lane roundabout located at Ellsworth Rd. and
State St., Ann Arbor, MI, USA, providing 7x24 real-time traffic flow monitoring
and high-precision vehicle trajectory extraction. The whole system runs
efficiently on a low-power edge computing device with all-component end-to-end
delay of less than 20ms.
Related papers
- MOSE: Boosting Vision-based Roadside 3D Object Detection with Scene Cues [12.508548561872553]
We propose a novel framework, namely MOSE, for MOnocular 3D object detection with Scene cuEs.
A scene cue bank is designed to aggregate scene cues from multiple frames of the same scene.
A transformer-based decoder lifts the aggregated scene cues as well as the 3D position embeddings for 3D object location.
arXiv Detail & Related papers (2024-04-08T08:11:56Z) - MSight: An Edge-Cloud Infrastructure-based Perception System for
Connected Automated Vehicles [58.461077944514564]
This paper presents MSight, a cutting-edge roadside perception system specifically designed for automated vehicles.
MSight offers real-time vehicle detection, localization, tracking, and short-term trajectory prediction.
Evaluations underscore the system's capability to uphold lane-level accuracy with minimal latency.
arXiv Detail & Related papers (2023-10-08T21:32:30Z) - The Interstate-24 3D Dataset: a new benchmark for 3D multi-camera
vehicle tracking [4.799822253865053]
This work presents a novel video dataset recorded from overlapping highway traffic cameras along an urban interstate, enabling multi-camera 3D object tracking in a traffic monitoring context.
Data is released from 3 scenes containing video from at least 16 cameras each, totaling 57 minutes in length.
877,000 3D bounding boxes and corresponding object tracklets are fully and accurately annotated for each camera field of view and are combined into a spatially and temporally continuous set of vehicle trajectories for each scene.
arXiv Detail & Related papers (2023-08-28T18:43:33Z) - Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view.
Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z) - Rope3D: TheRoadside Perception Dataset for Autonomous Driving and
Monocular 3D Object Detection Task [48.555440807415664]
We present the first high-diversity challenging Roadside Perception 3D dataset- Rope3D from a novel view.
The dataset consists of 50k images and over 1.5M 3D objects in various scenes.
We propose to leverage the geometry constraint to solve the inherent ambiguities caused by various sensors, viewpoints.
arXiv Detail & Related papers (2022-03-25T12:13:23Z) - CFTrack: Center-based Radar and Camera Fusion for 3D Multi-Object
Tracking [9.62721286522053]
We propose an end-to-end network for joint object detection and tracking based on radar and camera sensor fusion.
Our proposed method uses a center-based radar-camera fusion algorithm for object detection and utilizes a greedy algorithm for object association.
We evaluate our method on the challenging nuScenes dataset, where it achieves 20.0 AMOTA and outperforms all vision-based 3D tracking methods in the benchmark.
arXiv Detail & Related papers (2021-07-11T23:56:53Z) - Towards Autonomous Driving: a Multi-Modal 360$^{\circ}$ Perception
Proposal [87.11988786121447]
This paper presents a framework for 3D object detection and tracking for autonomous vehicles.
The solution, based on a novel sensor fusion configuration, provides accurate and reliable road environment detection.
A variety of tests of the system, deployed in an autonomous vehicle, have successfully assessed the suitability of the proposed perception stack.
arXiv Detail & Related papers (2020-08-21T20:36:21Z) - Monocular Vision based Crowdsourced 3D Traffic Sign Positioning with
Unknown Camera Intrinsics and Distortion Coefficients [11.38332845467423]
We demonstrate an approach to computing 3D traffic sign positions without knowing the camera focal lengths, principal point, and distortion coefficients a priori.
We achieve an average single journey relative and absolute positioning accuracy of 0.26 m and 1.38 m, respectively.
arXiv Detail & Related papers (2020-07-09T07:03:17Z) - End-to-end Learning for Inter-Vehicle Distance and Relative Velocity
Estimation in ADAS with a Monocular Camera [81.66569124029313]
We propose a camera-based inter-vehicle distance and relative velocity estimation method based on end-to-end training of a deep neural network.
The key novelty of our method is the integration of multiple visual clues provided by any two time-consecutive monocular frames.
We also propose a vehicle-centric sampling mechanism to alleviate the effect of perspective distortion in the motion field.
arXiv Detail & Related papers (2020-06-07T08:18:31Z) - Ego-motion and Surrounding Vehicle State Estimation Using a Monocular
Camera [11.29865843123467]
We propose a novel machine learning method to estimate ego-motion and surrounding vehicle state using a single monocular camera.
Our approach is based on a combination of three deep neural networks to estimate the 3D vehicle bounding box, depth, and optical flow from a sequence of images.
arXiv Detail & Related papers (2020-05-04T16:41:38Z) - Road Curb Detection and Localization with Monocular Forward-view Vehicle
Camera [74.45649274085447]
We propose a robust method for estimating road curb 3D parameters using a calibrated monocular camera equipped with a fisheye lens.
Our approach is able to estimate the vehicle to curb distance in real time with mean accuracy of more than 90%.
arXiv Detail & Related papers (2020-02-28T00:24:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.