Fully End-to-end Autonomous Driving with Semantic Depth Cloud Mapping
and Multi-Agent
- URL: http://arxiv.org/abs/2204.05513v1
- Date: Tue, 12 Apr 2022 03:57:01 GMT
- Title: Fully End-to-end Autonomous Driving with Semantic Depth Cloud Mapping
and Multi-Agent
- Authors: Oskar Natan and Jun Miura
- Abstract summary: We propose a novel deep learning model trained with end-to-end and multi-task learning manners to perform both perception and control tasks simultaneously.
The model is evaluated on CARLA simulator with various scenarios made of normal-adversarial situations and different weathers to mimic real-world conditions.
- Score: 2.512827436728378
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Focusing on the task of point-to-point navigation for an autonomous driving
vehicle, we propose a novel deep learning model trained with end-to-end and
multi-task learning manners to perform both perception and control tasks
simultaneously. The model is used to drive the ego vehicle safely by following
a sequence of routes defined by the global planner. The perception part of the
model is used to encode high-dimensional observation data provided by an RGBD
camera while performing semantic segmentation, semantic depth cloud (SDC)
mapping, and traffic light state and stop sign prediction. Then, the control
part decodes the encoded features along with additional information provided by
GPS and speedometer to predict waypoints that come with a latent feature space.
Furthermore, two agents are employed to process these outputs and make a
control policy that determines the level of steering, throttle, and brake as
the final action. The model is evaluated on CARLA simulator with various
scenarios made of normal-adversarial situations and different weathers to mimic
real-world conditions. In addition, we do a comparative study with some recent
models to justify the performance in multiple aspects of driving. Moreover, we
also conduct an ablation study on SDC mapping and multi-agent to understand
their roles and behavior. As a result, our model achieves the highest driving
score even with fewer parameters and computation load. To support future
studies, we share our codes at
https://github.com/oskarnatan/end-to-end-driving.
Related papers
- DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral
Planning States for Autonomous Driving [69.82743399946371]
DriveMLM is a framework that can perform close-loop autonomous driving in realistic simulators.
We employ a multi-modal LLM (MLLM) to model the behavior planning module of a module AD system.
This model can plug-and-play in existing AD systems such as Apollo for close-loop driving.
arXiv Detail & Related papers (2023-12-14T18:59:05Z) - Trajeglish: Traffic Modeling as Next-Token Prediction [67.28197954427638]
A longstanding challenge for self-driving development is simulating dynamic driving scenarios seeded from recorded driving logs.
We apply tools from discrete sequence modeling to model how vehicles, pedestrians and cyclists interact in driving scenarios.
Our model tops the Sim Agents Benchmark, surpassing prior work along the realism meta metric by 3.3% and along the interaction metric by 9.9%.
arXiv Detail & Related papers (2023-12-07T18:53:27Z) - LeTFuser: Light-weight End-to-end Transformer-Based Sensor Fusion for
Autonomous Driving with Multi-Task Learning [16.241116794114525]
We introduce LeTFuser, an algorithm for fusing multiple RGB-D camera representations.
To perform perception and control tasks simultaneously, we utilize multi-task learning.
arXiv Detail & Related papers (2023-10-19T20:09:08Z) - Video Killed the HD-Map: Predicting Multi-Agent Behavior Directly From
Aerial Images [14.689298253430568]
We propose an aerial image-based map (AIM) representation that requires minimal annotation and provides rich road context information for traffic agents like pedestrians and vehicles.
Our results demonstrate competitive multi-agent trajectory prediction performance especially for pedestrians in the scene when using our AIM representation.
arXiv Detail & Related papers (2023-05-19T17:48:01Z) - TrafficBots: Towards World Models for Autonomous Driving Simulation and
Motion Prediction [149.5716746789134]
We show data-driven traffic simulation can be formulated as a world model.
We present TrafficBots, a multi-agent policy built upon motion prediction and end-to-end driving.
Experiments on the open motion dataset show TrafficBots can simulate realistic multi-agent behaviors.
arXiv Detail & Related papers (2023-03-07T18:28:41Z) - CARNet: A Dynamic Autoencoder for Learning Latent Dynamics in Autonomous
Driving Tasks [11.489187712465325]
An autonomous driving system should effectively use the information collected from the various sensors in order to form an abstract description of the world.
Deep learning models, such as autoencoders, can be used for that purpose, as they can learn compact latent representations from a stream of incoming data.
This work proposes CARNet, a Combined dynAmic autoencodeR NETwork architecture that utilizes an autoencoder combined with a recurrent neural network to learn the current latent representation.
arXiv Detail & Related papers (2022-05-18T04:15:42Z) - Predicting Take-over Time for Autonomous Driving with Real-World Data:
Robust Data Augmentation, Models, and Evaluation [11.007092387379076]
We develop and train take-over time (TOT) models that operate on mid and high-level features produced by computer vision algorithms operating on different driver-facing camera views.
We show that a TOT model supported by augmented data can be used to produce continuous estimates of take-over times without delay.
arXiv Detail & Related papers (2021-07-27T16:39:50Z) - Autonomous Vehicles that Alert Humans to Take-Over Controls: Modeling
with Real-World Data [11.007092387379076]
This study focuses on the development of contextual, semantically meaningful representations of the driver state.
We conduct a large-scale real-world controlled data study where participants are instructed to take-over control from an autonomous agent.
These take-over events are captured using multiple driver-facing cameras, which when labelled result in a dataset of control transitions and their corresponding take-over times (TOTs)
After augmenting this dataset, we develop and train TOT models that operate sequentially on low and mid-level features produced by computer vision algorithms operating on different driver-facing camera views.
arXiv Detail & Related papers (2021-04-23T09:16:53Z) - Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention.
Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z) - A Driving Behavior Recognition Model with Bi-LSTM and Multi-Scale CNN [59.57221522897815]
We propose a neural network model based on trajectories information for driving behavior recognition.
We evaluate the proposed model on the public BLVD dataset, achieving a satisfying performance.
arXiv Detail & Related papers (2021-03-01T06:47:29Z) - IntentNet: Learning to Predict Intention from Raw Sensor Data [86.74403297781039]
In this paper, we develop a one-stage detector and forecaster that exploits both 3D point clouds produced by a LiDAR sensor as well as dynamic maps of the environment.
Our multi-task model achieves better accuracy than the respective separate modules while saving computation, which is critical to reducing reaction time in self-driving applications.
arXiv Detail & Related papers (2021-01-20T00:31:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.