AutoLay: Benchmarking amodal layout estimation for autonomous driving
- URL: http://arxiv.org/abs/2108.09047v1
- Date: Fri, 20 Aug 2021 08:21:11 GMT
- Title: AutoLay: Benchmarking amodal layout estimation for autonomous driving
- Authors: Kaustubh Mani, N. Sai Shankar, Krishna Murthy Jatavallabhula and K.
Madhava Krishna
- Abstract summary: AutoLay is a dataset and benchmark for amodal layout estimation from monocular images.
In addition to fine-grained attributes such as lanes, sidewalks, and vehicles, we also provide semantically annotated 3D point clouds.
- Score: 18.152206533685412
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Given an image or a video captured from a monocular camera, amodal layout
estimation is the task of predicting semantics and occupancy in bird's eye
view. The term amodal implies we also reason about entities in the scene that
are occluded or truncated in image space. While several recent efforts have
tackled this problem, there is a lack of standardization in task specification,
datasets, and evaluation protocols. We address these gaps with AutoLay, a
dataset and benchmark for amodal layout estimation from monocular images.
AutoLay encompasses driving imagery from two popular datasets: KITTI and
Argoverse. In addition to fine-grained attributes such as lanes, sidewalks, and
vehicles, we also provide semantically annotated 3D point clouds. We implement
several baselines and bleeding edge approaches, and release our data and code.
Related papers
- SUPS: A Simulated Underground Parking Scenario Dataset for Autonomous
Driving [41.221988979184665]
SUPS is a simulated dataset for underground automatic parking.
It supports multiple tasks with multiple sensors and multiple semantic labels aligned with successive images.
We also evaluate the state-of-the-art SLAM algorithms and perception models on our dataset.
arXiv Detail & Related papers (2023-02-25T02:59:12Z) - Argoverse 2: Next Generation Datasets for Self-Driving Perception and
Forecasting [64.7364925689825]
Argoverse 2 (AV2) is a collection of three datasets for perception and forecasting research in the self-driving domain.
The Lidar dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose.
The Motion Forecasting dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene.
arXiv Detail & Related papers (2023-01-02T00:36:22Z) - Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view.
Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z) - Sparse Semantic Map-Based Monocular Localization in Traffic Scenes Using
Learned 2D-3D Point-Line Correspondences [29.419138863851526]
Given a query image, the goal is to estimate the camera pose corresponding to the prior map.
Existing approaches rely heavily on dense point descriptors at the feature level to solve the registration problem.
We propose a sparse semantic map-based monocular localization method, which solves 2D-3D registration via a well-designed deep neural network.
arXiv Detail & Related papers (2022-10-10T10:29:07Z) - JPerceiver: Joint Perception Network for Depth, Pose and Layout
Estimation in Driving Scenes [75.20435924081585]
JPerceiver can simultaneously estimate scale-aware depth and VO as well as BEV layout from a monocular video sequence.
It exploits the cross-view geometric transformation (CGT) to propagate the absolute scale from the road layout to depth and VO.
Experiments on Argoverse, Nuscenes and KITTI show the superiority of JPerceiver over existing methods on all the above three tasks.
arXiv Detail & Related papers (2022-07-16T10:33:59Z) - MGNet: Monocular Geometric Scene Understanding for Autonomous Driving [10.438741209852209]
MGNet is a multi-task framework for monocular geometric scene understanding.
We define monocular geometric scene understanding as the combination of two known tasks: Panoptic segmentation and self-supervised monocular depth estimation.
Our model is designed with focus on low latency to provide fast inference in real-time on a single consumer-grade GPU.
arXiv Detail & Related papers (2022-06-27T11:27:55Z) - Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via
Cross-modal Distillation [32.33170182669095]
This work investigates learning pixel-wise semantic image segmentation in urban scenes without any manual annotation, just from the raw non-curated data collected by cars.
We propose a novel method for cross-modal unsupervised learning of semantic image segmentation by leveraging synchronized LiDAR and image data.
arXiv Detail & Related papers (2022-03-21T17:35:46Z) - One Million Scenes for Autonomous Driving: ONCE Dataset [91.94189514073354]
We introduce the ONCE dataset for 3D object detection in the autonomous driving scenario.
The data is selected from 144 driving hours, which is 20x longer than the largest 3D autonomous driving dataset available.
We reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.
arXiv Detail & Related papers (2021-06-21T12:28:08Z) - Hidden Footprints: Learning Contextual Walkability from 3D Human Trails [70.01257397390361]
Current datasets only tell you where people are, not where they could be.
We first augment the set of valid, labeled walkable regions by propagating person observations between images, utilizing 3D information to create what we call hidden footprints.
We devise a training strategy designed for such sparse labels, combining a class-balanced classification loss with a contextual adversarial loss.
arXiv Detail & Related papers (2020-08-19T23:19:08Z) - MVLidarNet: Real-Time Multi-Class Scene Understanding for Autonomous
Driving Using Multiple Views [60.538802124885414]
We present Multi-View LidarNet (MVLidarNet), a two-stage deep neural network for multi-class object detection and drivable space segmentation.
MVLidarNet is able to detect and classify objects while simultaneously determining the drivable space using a single LiDAR scan as input.
We show results on both KITTI and a much larger internal dataset, thus demonstrating the method's ability to scale by an order of magnitude.
arXiv Detail & Related papers (2020-06-09T21:28:17Z) - MonoLayout: Amodal scene layout from a single image [12.466845447851377]
Given a single color image captured from a driving platform, we aim to predict the bird's-eye view layout of the road.
We dub this problem a scene layout estimation, which involves "hallucinating" scene layout.
To this end, we present Mono, a deep neural network for real-time amodal scene layout estimation.
arXiv Detail & Related papers (2020-02-19T19:16:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.