Interpretable End-to-End Driving Model for Implicit Scene Understanding
- URL: http://arxiv.org/abs/2308.01180v1
- Date: Wed, 2 Aug 2023 14:43:08 GMT
- Title: Interpretable End-to-End Driving Model for Implicit Scene Understanding
- Authors: Yiyang Sun, Xiaonian Wang, Yangyang Zhang, Jiagui Tang, Xiaqiang Tang,
Jing Yao
- Abstract summary: We propose an end-to-end Interpretable Implicit Driving Scene Understanding (II-DSU) model to extract implicit high-dimensional scene features.
Our approach achieves the new state-of-the-art and is able to obtain scene features that embody richer scene information relevant to driving.
- Score: 3.4248756007722987
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Driving scene understanding is to obtain comprehensive scene information
through the sensor data and provide a basis for downstream tasks, which is
indispensable for the safety of self-driving vehicles. Specific perception
tasks, such as object detection and scene graph generation, are commonly used.
However, the results of these tasks are only equivalent to the characterization
of sampling from high-dimensional scene features, which are not sufficient to
represent the scenario. In addition, the goal of perception tasks is
inconsistent with human driving that just focuses on what may affect the
ego-trajectory. Therefore, we propose an end-to-end Interpretable Implicit
Driving Scene Understanding (II-DSU) model to extract implicit high-dimensional
scene features as scene understanding results guided by a planning module and
to validate the plausibility of scene understanding using auxiliary perception
tasks for visualization. Experimental results on CARLA benchmarks show that our
approach achieves the new state-of-the-art and is able to obtain scene features
that embody richer scene information relevant to driving, enabling superior
performance of the downstream planning.
Related papers
- PreGSU-A Generalized Traffic Scene Understanding Model for Autonomous Driving based on Pre-trained Graph Attention Network [23.38434020807342]
Scene understanding, defined as learning, extraction, and representation of interactions among traffic elements, is one of the critical challenges toward high-level autonomous driving (AD)
Current scene understanding methods mainly focus on one concrete single task, such as trajectory prediction and risk level evaluation.
We propose PreGSU, a generalized pre-trained scene understanding model based on graph attention network to learn the universal interaction and reasoning of traffic scenes to support various downstream tasks.
arXiv Detail & Related papers (2024-04-16T03:34:35Z) - Concretization of Abstract Traffic Scene Specifications Using Metaheuristic Search [1.9307952728103126]
As a first step towards scenario-based testing of AVs, the initial scene of a traffic scenario must be concretized.
We propose a traffic scene concretization approach that places vehicles on realistic road maps such that they satisfy an set of abstract constraints.
We conduct a series of experiments over three realistic road maps to compare eight configurations of our approach with three variations of the state-of-the-art Scenic tool.
arXiv Detail & Related papers (2023-07-15T15:13:16Z) - Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos [29.529768377746194]
This paper proposes a CLIP-based driver activity recognition approach that identifies driver distraction from naturalistic driving images and videos.
Our results show that this framework offers state-of-the-art performance on zero-shot transfer and video-based CLIP for predicting the driver's state on two public datasets.
arXiv Detail & Related papers (2023-06-16T20:02:51Z) - Scene as Occupancy [66.43673774733307]
OccNet is a vision-centric pipeline with a cascade and temporal voxel decoder to reconstruct 3D occupancy.
We propose OpenOcc, the first dense high-quality 3D occupancy benchmark built on top of nuScenes.
arXiv Detail & Related papers (2023-06-05T13:01:38Z) - OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping [84.65114565766596]
We present OpenLane-V2, the first dataset on topology reasoning for traffic scene structure.
OpenLane-V2 consists of 2,000 annotated road scenes that describe traffic elements and their correlation to the lanes.
We evaluate various state-of-the-art methods, and present their quantitative and qualitative results on OpenLane-V2 to indicate future avenues for investigating topology reasoning in traffic scenes.
arXiv Detail & Related papers (2023-04-20T16:31:22Z) - A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented,
Temporal and Depth-aware design [77.34726150561087]
We conduct a survey on the most relevant and recent advances in Deep Semantic in the context of vision for autonomous vehicles.
Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective.
arXiv Detail & Related papers (2023-03-08T01:29:55Z) - Traffic Scene Parsing through the TSP6K Dataset [109.69836680564616]
We introduce a specialized traffic monitoring dataset, termed TSP6K, with high-quality pixel-level and instance-level annotations.
The dataset captures more crowded traffic scenes with several times more traffic participants than the existing driving scenes.
We propose a detail refining decoder for scene parsing, which recovers the details of different semantic regions in traffic scenes.
arXiv Detail & Related papers (2023-03-06T02:05:14Z) - Learning Road Scene-level Representations via Semantic Region Prediction [11.518756759576657]
We tackle two vital tasks in automated driving systems, i.e., driver intent prediction and risk object identification from egocentric images.
We contend that a scene-level representation must capture higher-level semantic and geometric representations of traffic scenes around ego-vehicle.
We propose to learn scene-level representations via a novel semantic region prediction task and an automatic semantic region labeling algorithm.
arXiv Detail & Related papers (2023-01-02T15:13:30Z) - JPerceiver: Joint Perception Network for Depth, Pose and Layout
Estimation in Driving Scenes [75.20435924081585]
JPerceiver can simultaneously estimate scale-aware depth and VO as well as BEV layout from a monocular video sequence.
It exploits the cross-view geometric transformation (CGT) to propagate the absolute scale from the road layout to depth and VO.
Experiments on Argoverse, Nuscenes and KITTI show the superiority of JPerceiver over existing methods on all the above three tasks.
arXiv Detail & Related papers (2022-07-16T10:33:59Z) - An Image-based Approach of Task-driven Driving Scene Categorization [7.291979964739049]
This paper proposes a method of task-driven driving scene categorization using weakly supervised data.
A measure is learned to discriminate the scenes of different semantic attributes via contrastive learning.
The results of semantic scene similarity learning and driving scene categorization are extensively studied.
arXiv Detail & Related papers (2021-03-10T08:23:36Z) - Spatio-Temporal Graph for Video Captioning with Knowledge Distillation [50.034189314258356]
We propose a graph model for video captioning that exploits object interactions in space and time.
Our model builds interpretable links and is able to provide explicit visual grounding.
To avoid correlations caused by the variable number of objects, we propose an object-aware knowledge distillation mechanism.
arXiv Detail & Related papers (2020-03-31T03:58:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.