Understanding Bird's-Eye View Semantic HD-Maps Using an Onboard
Monocular Camera
- URL: http://arxiv.org/abs/2012.03040v1
- Date: Sat, 5 Dec 2020 14:39:14 GMT
- Title: Understanding Bird's-Eye View Semantic HD-Maps Using an Onboard
Monocular Camera
- Authors: Yigit Baran Can, Alexander Liniger, Ozan Unal, Danda Paudel, Luc Van
Gool
- Abstract summary: We study scene understanding in the form of online estimation of semantic bird's-eye-view HD-maps using the video input from a single onboard camera.
In our experiments, we demonstrate that the considered aspects are complementary to each other for HD-map understanding.
- Score: 110.83289076967895
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous navigation requires scene understanding of the action-space to
move or anticipate events. For planner agents moving on the ground plane, such
as autonomous vehicles, this translates to scene understanding in the
bird's-eye view. However, the onboard cameras of autonomous cars are
customarily mounted horizontally for a better view of the surrounding. In this
work, we study scene understanding in the form of online estimation of semantic
bird's-eye-view HD-maps using the video input from a single onboard camera. We
study three key aspects of this task, image-level understanding, BEV level
understanding, and the aggregation of temporal information. Based on these
three pillars we propose a novel architecture that combines these three
aspects. In our extensive experiments, we demonstrate that the considered
aspects are complementary to each other for HD-map understanding. Furthermore,
the proposed architecture significantly surpasses the current state-of-the-art.
Related papers
- Urban Scene Diffusion through Semantic Occupancy Map [49.20779809250597]
UrbanDiffusion is a 3D diffusion model conditioned on a Bird's-Eye View (BEV) map.
Our model learns the data distribution of scene-level structures within a latent space.
After training on real-world driving datasets, our model can generate a wide range of diverse urban scenes.
arXiv Detail & Related papers (2024-03-18T11:54:35Z) - Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction [84.94140661523956]
We propose a tri-perspective view (TPV) representation which accompanies BEV with two additional perpendicular planes.
We model each point in the 3D space by summing its projected features on the three planes.
Experiments show that our model trained with sparse supervision effectively predicts the semantic occupancy for all voxels.
arXiv Detail & Related papers (2023-02-15T17:58:10Z) - Estimation of Appearance and Occupancy Information in Birds Eye View
from Surround Monocular Images [2.69840007334476]
Birds-eye View (BEV) expresses the location of different traffic participants in the ego vehicle frame from a top-down view.
We propose a novel representation that captures various traffic participants appearance and occupancy information from an array of monocular cameras covering 360 deg field of view (FOV)
We use a learned image embedding of all camera images to generate a BEV of the scene at any instant that captures both appearance and occupancy of the scene.
arXiv Detail & Related papers (2022-11-08T20:57:56Z) - JPerceiver: Joint Perception Network for Depth, Pose and Layout
Estimation in Driving Scenes [75.20435924081585]
JPerceiver can simultaneously estimate scale-aware depth and VO as well as BEV layout from a monocular video sequence.
It exploits the cross-view geometric transformation (CGT) to propagate the absolute scale from the road layout to depth and VO.
Experiments on Argoverse, Nuscenes and KITTI show the superiority of JPerceiver over existing methods on all the above three tasks.
arXiv Detail & Related papers (2022-07-16T10:33:59Z) - Structured Bird's-Eye-View Traffic Scene Understanding from Onboard
Images [128.881857704338]
We study the problem of extracting a directed graph representing the local road network in BEV coordinates, from a single onboard camera image.
We show that the method can be extended to detect dynamic objects on the BEV plane.
We validate our approach against powerful baselines and show that our network achieves superior performance.
arXiv Detail & Related papers (2021-10-05T12:40:33Z) - Driving among Flatmobiles: Bird-Eye-View occupancy grids from a
monocular camera for holistic trajectory planning [11.686108908830805]
Camera-based end-to-end driving neural networks bring the promise of a low-cost system that maps camera images to driving control commands.
Recent works have shown the importance of using an explicit intermediate representation that has the benefits of increasing both the interpretability and the accuracy of networks' decisions.
We introduce a novel monocular camera-only holistic end-to-end trajectory planning network with a Bird-Eye-View intermediate representation.
arXiv Detail & Related papers (2020-08-10T12:16:44Z) - Distilled Semantics for Comprehensive Scene Understanding from Videos [53.49501208503774]
In this paper, we take an additional step toward holistic scene understanding with monocular cameras by learning depth and motion alongside with semantics.
We address the three tasks jointly by a novel training protocol based on knowledge distillation and self-supervision.
We show that it yields state-of-the-art results for monocular depth estimation, optical flow and motion segmentation.
arXiv Detail & Related papers (2020-03-31T08:52:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.