Driving among Flatmobiles: Bird-Eye-View occupancy grids from a
monocular camera for holistic trajectory planning
- URL: http://arxiv.org/abs/2008.04047v1
- Date: Mon, 10 Aug 2020 12:16:44 GMT
- Title: Driving among Flatmobiles: Bird-Eye-View occupancy grids from a
monocular camera for holistic trajectory planning
- Authors: Abdelhak Loukkal (UTC), Yves Grandvalet (Heudiasyc), Tom Drummond, You
Li (NRCIEA)
- Abstract summary: Camera-based end-to-end driving neural networks bring the promise of a low-cost system that maps camera images to driving control commands.
Recent works have shown the importance of using an explicit intermediate representation that has the benefits of increasing both the interpretability and the accuracy of networks' decisions.
We introduce a novel monocular camera-only holistic end-to-end trajectory planning network with a Bird-Eye-View intermediate representation.
- Score: 11.686108908830805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Camera-based end-to-end driving neural networks bring the promise of a
low-cost system that maps camera images to driving control commands. These
networks are appealing because they replace laborious hand engineered building
blocks but their black-box nature makes them difficult to delve in case of
failure. Recent works have shown the importance of using an explicit
intermediate representation that has the benefits of increasing both the
interpretability and the accuracy of networks' decisions. Nonetheless, these
camera-based networks reason in camera view where scale is not homogeneous and
hence not directly suitable for motion forecasting. In this paper, we introduce
a novel monocular camera-only holistic end-to-end trajectory planning network
with a Bird-Eye-View (BEV) intermediate representation that comes in the form
of binary Occupancy Grid Maps (OGMs). To ease the prediction of OGMs in BEV
from camera images, we introduce a novel scheme where the OGMs are first
predicted as semantic masks in camera view and then warped in BEV using the
homography between the two planes. The key element allowing this transformation
to be applied to 3D objects such as vehicles, consists in predicting solely
their footprint in camera-view, hence respecting the flat world hypothesis
implied by the homography.
Related papers
- BEVSeg2TP: Surround View Camera Bird's-Eye-View Based Joint Vehicle
Segmentation and Ego Vehicle Trajectory Prediction [4.328789276903559]
Trajectory prediction is a key task for vehicle autonomy.
There is a growing interest in learning-based trajectory prediction.
We show that there is the potential to improve the performance of perception.
arXiv Detail & Related papers (2023-12-20T15:02:37Z) - Multi-camera Bird's Eye View Perception for Autonomous Driving [17.834495597639805]
It is essential to produce perception outputs in 3D to enable the spatial reasoning of other agents and structures.
The most basic approach to achieving the desired BEV representation from a camera image is IPM, assuming a flat ground surface.
More recent approaches use deep neural networks to output directly in BEV space.
arXiv Detail & Related papers (2023-09-16T19:12:05Z) - Street-View Image Generation from a Bird's-Eye View Layout [95.36869800896335]
Bird's-Eye View (BEV) Perception has received increasing attention in recent years.
Data-driven simulation for autonomous driving has been a focal point of recent research.
We propose BEVGen, a conditional generative model that synthesizes realistic and spatially consistent surrounding images.
arXiv Detail & Related papers (2023-01-11T18:39:34Z) - LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic
Segmentation [43.12994451281451]
We present 'LaRa', an efficient encoder-decoder, transformer-based model for vehicle semantic segmentation from multiple cameras.
Our approach uses a system of cross-attention to aggregate information over multiple sensors into a compact, yet rich, collection of latent representations.
arXiv Detail & Related papers (2022-06-27T13:37:50Z) - Neural Scene Representation for Locomotion on Structured Terrain [56.48607865960868]
We propose a learning-based method to reconstruct the local terrain for a mobile robot traversing urban environments.
Using a stream of depth measurements from the onboard cameras and the robot's trajectory, the estimates the topography in the robot's vicinity.
We propose a 3D reconstruction model that faithfully reconstructs the scene, despite the noisy measurements and large amounts of missing data coming from the blind spots of the camera arrangement.
arXiv Detail & Related papers (2022-06-16T10:45:17Z) - GitNet: Geometric Prior-based Transformation for Birds-Eye-View
Segmentation [105.19949897812494]
Birds-eye-view (BEV) semantic segmentation is critical for autonomous driving.
We present a novel two-stage Geometry Prior-based Transformation framework named GitNet.
arXiv Detail & Related papers (2022-04-16T06:46:45Z) - Structured Bird's-Eye-View Traffic Scene Understanding from Onboard
Images [128.881857704338]
We study the problem of extracting a directed graph representing the local road network in BEV coordinates, from a single onboard camera image.
We show that the method can be extended to detect dynamic objects on the BEV plane.
We validate our approach against powerful baselines and show that our network achieves superior performance.
arXiv Detail & Related papers (2021-10-05T12:40:33Z) - Monocular 3D Vehicle Detection Using Uncalibrated Traffic Cameras
through Homography [12.062095895630563]
This paper proposes a method to extract the position and pose of vehicles in the 3D world from a single traffic camera.
We observe that the homography between the road plane and the image plane is essential to 3D vehicle detection.
We propose a new regression target called textittailedr-box and a textitdual-view network architecture which boosts the detection accuracy on warped BEV images.
arXiv Detail & Related papers (2021-03-29T02:57:37Z) - Understanding Bird's-Eye View Semantic HD-Maps Using an Onboard
Monocular Camera [110.83289076967895]
We study scene understanding in the form of online estimation of semantic bird's-eye-view HD-maps using the video input from a single onboard camera.
In our experiments, we demonstrate that the considered aspects are complementary to each other for HD-map understanding.
arXiv Detail & Related papers (2020-12-05T14:39:14Z) - Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by
Implicitly Unprojecting to 3D [100.93808824091258]
We propose a new end-to-end architecture that directly extracts a bird's-eye-view representation of a scene given image data from an arbitrary number of cameras.
Our approach is to "lift" each image individually into a frustum of features for each camera, then "splat" all frustums into a bird's-eye-view grid.
We show that the representations inferred by our model enable interpretable end-to-end motion planning by "shooting" template trajectories into a bird's-eye-view cost map output by our network.
arXiv Detail & Related papers (2020-08-13T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.