Learning Road Scene-level Representations via Semantic Region Prediction
- URL: http://arxiv.org/abs/2301.00714v1
- Date: Mon, 2 Jan 2023 15:13:30 GMT
- Title: Learning Road Scene-level Representations via Semantic Region Prediction
- Authors: Zihao Xiao, Alan Yuille, Yi-Ting Chen
- Abstract summary: We tackle two vital tasks in automated driving systems, i.e., driver intent prediction and risk object identification from egocentric images.
We contend that a scene-level representation must capture higher-level semantic and geometric representations of traffic scenes around ego-vehicle.
We propose to learn scene-level representations via a novel semantic region prediction task and an automatic semantic region labeling algorithm.
- Score: 11.518756759576657
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we tackle two vital tasks in automated driving systems, i.e.,
driver intent prediction and risk object identification from egocentric images.
Mainly, we investigate the question: what would be good road scene-level
representations for these two tasks? We contend that a scene-level
representation must capture higher-level semantic and geometric representations
of traffic scenes around ego-vehicle while performing actions to their
destinations. To this end, we introduce the representation of semantic regions,
which are areas where ego-vehicles visit while taking an afforded action (e.g.,
left-turn at 4-way intersections). We propose to learn scene-level
representations via a novel semantic region prediction task and an automatic
semantic region labeling algorithm. Extensive evaluations are conducted on the
HDD and nuScenes datasets, and the learned representations lead to
state-of-the-art performance for driver intention prediction and risk object
identification.
Related papers
- Flow-guided Motion Prediction with Semantics and Dynamic Occupancy Grid Maps [5.9803668726235575]
Occupancy Grid Maps (OGMs) are commonly employed for scene prediction.
Recent studies have successfully combined OGMs with deep learning methods to predict the evolution of scene.
We propose a novel multi-task framework that leverages dynamic OGMs and semantic information to predict both future vehicle semantic grids and the future flow of the scene.
arXiv Detail & Related papers (2024-07-22T14:42:34Z) - Mapping High-level Semantic Regions in Indoor Environments without
Object Recognition [50.624970503498226]
The present work proposes a method for semantic region mapping via embodied navigation in indoor environments.
To enable region identification, the method uses a vision-to-language model to provide scene information for mapping.
By projecting egocentric scene understanding into the global frame, the proposed method generates a semantic map as a distribution over possible region labels at each location.
arXiv Detail & Related papers (2024-03-11T18:09:50Z) - Interpretable End-to-End Driving Model for Implicit Scene Understanding [3.4248756007722987]
We propose an end-to-end Interpretable Implicit Driving Scene Understanding (II-DSU) model to extract implicit high-dimensional scene features.
Our approach achieves the new state-of-the-art and is able to obtain scene features that embody richer scene information relevant to driving.
arXiv Detail & Related papers (2023-08-02T14:43:08Z) - OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping [84.65114565766596]
We present OpenLane-V2, the first dataset on topology reasoning for traffic scene structure.
OpenLane-V2 consists of 2,000 annotated road scenes that describe traffic elements and their correlation to the lanes.
We evaluate various state-of-the-art methods, and present their quantitative and qualitative results on OpenLane-V2 to indicate future avenues for investigating topology reasoning in traffic scenes.
arXiv Detail & Related papers (2023-04-20T16:31:22Z) - Policy Pre-training for End-to-end Autonomous Driving via
Self-supervised Geometric Modeling [96.31941517446859]
We propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving.
We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos.
In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input.
In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only.
arXiv Detail & Related papers (2023-01-03T08:52:49Z) - NEAT: Neural Attention Fields for End-to-End Autonomous Driving [59.60483620730437]
We present NEural ATtention fields (NEAT), a novel representation that enables efficient reasoning for imitation learning models.
NEAT is a continuous function which maps locations in Bird's Eye View (BEV) scene coordinates to waypoints and semantics.
In a new evaluation setting involving adverse environmental conditions and challenging scenarios, NEAT outperforms several strong baselines and achieves driving scores on par with the privileged CARLA expert.
arXiv Detail & Related papers (2021-09-09T17:55:28Z) - An Image-based Approach of Task-driven Driving Scene Categorization [7.291979964739049]
This paper proposes a method of task-driven driving scene categorization using weakly supervised data.
A measure is learned to discriminate the scenes of different semantic attributes via contrastive learning.
The results of semantic scene similarity learning and driving scene categorization are extensively studied.
arXiv Detail & Related papers (2021-03-10T08:23:36Z) - Safety-Oriented Pedestrian Motion and Scene Occupancy Forecasting [91.69900691029908]
We advocate for predicting both the individual motions as well as the scene occupancy map.
We propose a Scene-Actor Graph Neural Network (SA-GNN) which preserves the relative spatial information of pedestrians.
On two large-scale real-world datasets, we showcase that our scene-occupancy predictions are more accurate and better calibrated than those from state-of-the-art motion forecasting methods.
arXiv Detail & Related papers (2021-01-07T06:08:21Z) - Commands 4 Autonomous Vehicles (C4AV) Workshop Summary [91.92872482200018]
This paper presents the results of the emphCommands for Autonomous Vehicles (C4AV) challenge based on the recent emphTalk2Car dataset.
We identify the aspects that render top-performing models successful, and relate them to existing state-of-the-art models for visual grounding.
arXiv Detail & Related papers (2020-09-18T12:33:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.