Estimation of Appearance and Occupancy Information in Birds Eye View
from Surround Monocular Images
- URL: http://arxiv.org/abs/2211.04557v1
- Date: Tue, 8 Nov 2022 20:57:56 GMT
- Title: Estimation of Appearance and Occupancy Information in Birds Eye View
from Surround Monocular Images
- Authors: Sarthak Sharma, Unnikrishnan R. Nair, Udit Singh Parihar, Midhun Menon
S and Srikanth Vidapanakal
- Abstract summary: Birds-eye View (BEV) expresses the location of different traffic participants in the ego vehicle frame from a top-down view.
We propose a novel representation that captures various traffic participants appearance and occupancy information from an array of monocular cameras covering 360 deg field of view (FOV)
We use a learned image embedding of all camera images to generate a BEV of the scene at any instant that captures both appearance and occupancy of the scene.
- Score: 2.69840007334476
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autonomous driving requires efficient reasoning about the location and
appearance of the different agents in the scene, which aids in downstream tasks
such as object detection, object tracking, and path planning. The past few
years have witnessed a surge in approaches that combine the different taskbased
modules of the classic self-driving stack into an End-toEnd(E2E) trainable
learning system. These approaches replace perception, prediction, and sensor
fusion modules with a single contiguous module with shared latent space
embedding, from which one extracts a human-interpretable representation of the
scene. One of the most popular representations is the Birds-eye View (BEV),
which expresses the location of different traffic participants in the ego
vehicle frame from a top-down view. However, a BEV does not capture the
chromatic appearance information of the participants. To overcome this
limitation, we propose a novel representation that captures various traffic
participants appearance and occupancy information from an array of monocular
cameras covering 360 deg field of view (FOV). We use a learned image embedding
of all camera images to generate a BEV of the scene at any instant that
captures both appearance and occupancy of the scene, which can aid in
downstream tasks such as object tracking and executing language-based commands.
We test the efficacy of our approach on synthetic dataset generated from CARLA.
The code, data set, and results can be found at https://rebrand.ly/APP
OCC-results.
Related papers
- BEVSeg2TP: Surround View Camera Bird's-Eye-View Based Joint Vehicle
Segmentation and Ego Vehicle Trajectory Prediction [4.328789276903559]
Trajectory prediction is a key task for vehicle autonomy.
There is a growing interest in learning-based trajectory prediction.
We show that there is the potential to improve the performance of perception.
arXiv Detail & Related papers (2023-12-20T15:02:37Z) - Towards Viewpoint Robustness in Bird's Eye View Segmentation [85.99907496019972]
We study how AV perception models are affected by changes in camera viewpoint.
Small changes to pitch, yaw, depth, or height of the camera at inference time lead to large drops in performance.
We introduce a technique for novel view synthesis and use it to transform collected data to the viewpoint of target rigs.
arXiv Detail & Related papers (2023-09-11T02:10:07Z) - Linking vision and motion for self-supervised object-centric perception [16.821130222597155]
Object-centric representations enable autonomous driving algorithms to reason about interactions between many independent agents and scene features.
Traditionally these representations have been obtained via supervised learning, but this decouples perception from the downstream driving task and could harm generalization.
We adapt a self-supervised object-centric vision model to perform object decomposition using only RGB video and the pose of the vehicle as inputs.
arXiv Detail & Related papers (2023-07-14T04:21:05Z) - Street-View Image Generation from a Bird's-Eye View Layout [95.36869800896335]
Bird's-Eye View (BEV) Perception has received increasing attention in recent years.
Data-driven simulation for autonomous driving has been a focal point of recent research.
We propose BEVGen, a conditional generative model that synthesizes realistic and spatially consistent surrounding images.
arXiv Detail & Related papers (2023-01-11T18:39:34Z) - LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic
Segmentation [43.12994451281451]
We present 'LaRa', an efficient encoder-decoder, transformer-based model for vehicle semantic segmentation from multiple cameras.
Our approach uses a system of cross-attention to aggregate information over multiple sensors into a compact, yet rich, collection of latent representations.
arXiv Detail & Related papers (2022-06-27T13:37:50Z) - BEVerse: Unified Perception and Prediction in Birds-Eye-View for
Vision-Centric Autonomous Driving [92.05963633802979]
We present BEVerse, a unified framework for 3D perception and prediction based on multi-camera systems.
We show that the multi-task BEVerse outperforms single-task methods on 3D object detection, semantic map construction, and motion prediction.
arXiv Detail & Related papers (2022-05-19T17:55:35Z) - Structured Bird's-Eye-View Traffic Scene Understanding from Onboard
Images [128.881857704338]
We study the problem of extracting a directed graph representing the local road network in BEV coordinates, from a single onboard camera image.
We show that the method can be extended to detect dynamic objects on the BEV plane.
We validate our approach against powerful baselines and show that our network achieves superior performance.
arXiv Detail & Related papers (2021-10-05T12:40:33Z) - OmniDet: Surround View Cameras based Multi-task Visual Perception
Network for Autonomous Driving [10.3540046389057]
This work presents a multi-task visual perception network on unrectified fisheye images.
It consists of six primary tasks necessary for an autonomous driving system.
We demonstrate that the jointly trained model performs better than the respective single task versions.
arXiv Detail & Related papers (2021-02-15T10:46:24Z) - Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by
Implicitly Unprojecting to 3D [100.93808824091258]
We propose a new end-to-end architecture that directly extracts a bird's-eye-view representation of a scene given image data from an arbitrary number of cameras.
Our approach is to "lift" each image individually into a frustum of features for each camera, then "splat" all frustums into a bird's-eye-view grid.
We show that the representations inferred by our model enable interpretable end-to-end motion planning by "shooting" template trajectories into a bird's-eye-view cost map output by our network.
arXiv Detail & Related papers (2020-08-13T06:29:01Z) - MVLidarNet: Real-Time Multi-Class Scene Understanding for Autonomous
Driving Using Multiple Views [60.538802124885414]
We present Multi-View LidarNet (MVLidarNet), a two-stage deep neural network for multi-class object detection and drivable space segmentation.
MVLidarNet is able to detect and classify objects while simultaneously determining the drivable space using a single LiDAR scan as input.
We show results on both KITTI and a much larger internal dataset, thus demonstrating the method's ability to scale by an order of magnitude.
arXiv Detail & Related papers (2020-06-09T21:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.