Structured Bird's-Eye-View Traffic Scene Understanding from Onboard
Images
- URL: http://arxiv.org/abs/2110.01997v1
- Date: Tue, 5 Oct 2021 12:40:33 GMT
- Title: Structured Bird's-Eye-View Traffic Scene Understanding from Onboard
Images
- Authors: Yigit Baran Can, Alexander Liniger, Danda Pani Paudel, Luc Van Gool
- Abstract summary: We study the problem of extracting a directed graph representing the local road network in BEV coordinates, from a single onboard camera image.
We show that the method can be extended to detect dynamic objects on the BEV plane.
We validate our approach against powerful baselines and show that our network achieves superior performance.
- Score: 128.881857704338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous navigation requires structured representation of the road network
and instance-wise identification of the other traffic agents. Since the traffic
scene is defined on the ground plane, this corresponds to scene understanding
in the bird's-eye-view (BEV). However, the onboard cameras of autonomous cars
are customarily mounted horizontally for a better view of the surrounding,
making this task very challenging. In this work, we study the problem of
extracting a directed graph representing the local road network in BEV
coordinates, from a single onboard camera image. Moreover, we show that the
method can be extended to detect dynamic objects on the BEV plane. The
semantics, locations, and orientations of the detected objects together with
the road graph facilitates a comprehensive understanding of the scene. Such
understanding becomes fundamental for the downstream tasks, such as path
planning and navigation. We validate our approach against powerful baselines
and show that our network achieves superior performance. We also demonstrate
the effects of various design choices through ablation studies. Code:
https://github.com/ybarancan/STSU
Related papers
- Street-View Image Generation from a Bird's-Eye View Layout [95.36869800896335]
Bird's-Eye View (BEV) Perception has received increasing attention in recent years.
Data-driven simulation for autonomous driving has been a focal point of recent research.
We propose BEVGen, a conditional generative model that synthesizes realistic and spatially consistent surrounding images.
arXiv Detail & Related papers (2023-01-11T18:39:34Z) - Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view.
Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z) - Estimation of Appearance and Occupancy Information in Birds Eye View
from Surround Monocular Images [2.69840007334476]
Birds-eye View (BEV) expresses the location of different traffic participants in the ego vehicle frame from a top-down view.
We propose a novel representation that captures various traffic participants appearance and occupancy information from an array of monocular cameras covering 360 deg field of view (FOV)
We use a learned image embedding of all camera images to generate a BEV of the scene at any instant that captures both appearance and occupancy of the scene.
arXiv Detail & Related papers (2022-11-08T20:57:56Z) - JPerceiver: Joint Perception Network for Depth, Pose and Layout
Estimation in Driving Scenes [75.20435924081585]
JPerceiver can simultaneously estimate scale-aware depth and VO as well as BEV layout from a monocular video sequence.
It exploits the cross-view geometric transformation (CGT) to propagate the absolute scale from the road layout to depth and VO.
Experiments on Argoverse, Nuscenes and KITTI show the superiority of JPerceiver over existing methods on all the above three tasks.
arXiv Detail & Related papers (2022-07-16T10:33:59Z) - BEV-MODNet: Monocular Camera based Bird's Eye View Moving Object
Detection for Autonomous Driving [2.9769485817170387]
CNNs can leverage the global context in the scene to project better.
We create an extended KITTI-raw dataset consisting of 12.9k images with annotations of moving object masks in BEV space for five classes.
We observe a significant improvement of 13% in mIoU using the simple baseline implementation.
arXiv Detail & Related papers (2021-07-11T01:11:58Z) - Understanding Bird's-Eye View Semantic HD-Maps Using an Onboard
Monocular Camera [110.83289076967895]
We study scene understanding in the form of online estimation of semantic bird's-eye-view HD-maps using the video input from a single onboard camera.
In our experiments, we demonstrate that the considered aspects are complementary to each other for HD-map understanding.
arXiv Detail & Related papers (2020-12-05T14:39:14Z) - Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by
Implicitly Unprojecting to 3D [100.93808824091258]
We propose a new end-to-end architecture that directly extracts a bird's-eye-view representation of a scene given image data from an arbitrary number of cameras.
Our approach is to "lift" each image individually into a frustum of features for each camera, then "splat" all frustums into a bird's-eye-view grid.
We show that the representations inferred by our model enable interpretable end-to-end motion planning by "shooting" template trajectories into a bird's-eye-view cost map output by our network.
arXiv Detail & Related papers (2020-08-13T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.