Related papers: Fully End-to-end Autonomous Driving with Semantic Depth Cloud Mapping and Multi-Agent

Fully End-to-end Autonomous Driving with Semantic Depth Cloud Mapping and Multi-Agent

URL: http://arxiv.org/abs/2204.05513v1
Date: Tue, 12 Apr 2022 03:57:01 GMT
Title: Fully End-to-end Autonomous Driving with Semantic Depth Cloud Mapping and Multi-Agent
Authors: Oskar Natan and Jun Miura
Abstract summary: We propose a novel deep learning model trained with end-to-end and multi-task learning manners to perform both perception and control tasks simultaneously. The model is evaluated on CARLA simulator with various scenarios made of normal-adversarial situations and different weathers to mimic real-world conditions.
Score: 2.512827436728378
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Focusing on the task of point-to-point navigation for an autonomous driving vehicle, we propose a novel deep learning model trained with end-to-end and multi-task learning manners to perform both perception and control tasks simultaneously. The model is used to drive the ego vehicle safely by following a sequence of routes defined by the global planner. The perception part of the model is used to encode high-dimensional observation data provided by an RGBD camera while performing semantic segmentation, semantic depth cloud (SDC) mapping, and traffic light state and stop sign prediction. Then, the control part decodes the encoded features along with additional information provided by GPS and speedometer to predict waypoints that come with a latent feature space. Furthermore, two agents are employed to process these outputs and make a control policy that determines the level of steering, throttle, and brake as the final action. The model is evaluated on CARLA simulator with various scenarios made of normal-adversarial situations and different weathers to mimic real-world conditions. In addition, we do a comparative study with some recent models to justify the performance in multiple aspects of driving. Moreover, we also conduct an ablation study on SDC mapping and multi-agent to understand their roles and behavior. As a result, our model achieves the highest driving score even with fewer parameters and computation load. To support future studies, we share our codes at https://github.com/oskarnatan/end-to-end-driving.

Related papers

The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey [50.62538723793247]
Driving World Model (DWM) focuses on predicting scene evolution during the driving process. DWM methods enable autonomous driving systems to better perceive, understand, and interact with dynamic driving environments.
arXiv Detail & Related papers (2025-02-14T18:43:15Z)
DriveGPT: Scaling Autoregressive Behavior Models for Driving [11.733428769776204]
We present DriveGPT, a scalable behavior model for autonomous driving. We learn a transformer model to predict future agent states as tokens in an autoregressive fashion. We scale up our model parameters and training data by multiple orders of magnitude, enabling us to explore the scaling properties.
arXiv Detail & Related papers (2024-12-19T00:06:09Z)
GPD-1: Generative Pre-training for Driving [77.06803277735132]
We propose a unified Generative Pre-training for Driving (GPD-1) model to accomplish all these tasks. We represent each scene with ego, agent, and map tokens and formulate autonomous driving as a unified token generation problem. Our GPD-1 successfully generalizes to various tasks without finetuning, including scene generation, traffic simulation, closed-loop simulation, map prediction, and motion planning.
arXiv Detail & Related papers (2024-12-11T18:59:51Z)
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving [69.82743399946371]
DriveMLM is a framework that can perform close-loop autonomous driving in realistic simulators. We employ a multi-modal LLM (MLLM) to model the behavior planning module of a module AD system. This model can plug-and-play in existing AD systems such as Apollo for close-loop driving.
arXiv Detail & Related papers (2023-12-14T18:59:05Z)
Trajeglish: Traffic Modeling as Next-Token Prediction [67.28197954427638]
A longstanding challenge for self-driving development is simulating dynamic driving scenarios seeded from recorded driving logs. We apply tools from discrete sequence modeling to model how vehicles, pedestrians and cyclists interact in driving scenarios. Our model tops the Sim Agents Benchmark, surpassing prior work along the realism meta metric by 3.3% and along the interaction metric by 9.9%.
arXiv Detail & Related papers (2023-12-07T18:53:27Z)
LeTFuser: Light-weight End-to-end Transformer-Based Sensor Fusion for Autonomous Driving with Multi-Task Learning [16.241116794114525]
We introduce LeTFuser, an algorithm for fusing multiple RGB-D camera representations. To perform perception and control tasks simultaneously, we utilize multi-task learning.
arXiv Detail & Related papers (2023-10-19T20:09:08Z)
Video Killed the HD-Map: Predicting Multi-Agent Behavior Directly From Aerial Images [14.689298253430568]
We propose an aerial image-based map (AIM) representation that requires minimal annotation and provides rich road context information for traffic agents like pedestrians and vehicles. Our results demonstrate competitive multi-agent trajectory prediction performance especially for pedestrians in the scene when using our AIM representation.
arXiv Detail & Related papers (2023-05-19T17:48:01Z)
TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction [149.5716746789134]
We show data-driven traffic simulation can be formulated as a world model. We present TrafficBots, a multi-agent policy built upon motion prediction and end-to-end driving. Experiments on the open motion dataset show TrafficBots can simulate realistic multi-agent behaviors.
arXiv Detail & Related papers (2023-03-07T18:28:41Z)
CARNet: A Dynamic Autoencoder for Learning Latent Dynamics in Autonomous Driving Tasks [11.489187712465325]
An autonomous driving system should effectively use the information collected from the various sensors in order to form an abstract description of the world. Deep learning models, such as autoencoders, can be used for that purpose, as they can learn compact latent representations from a stream of incoming data. This work proposes CARNet, a Combined dynAmic autoencodeR NETwork architecture that utilizes an autoencoder combined with a recurrent neural network to learn the current latent representation.
arXiv Detail & Related papers (2022-05-18T04:15:42Z)
Predicting Take-over Time for Autonomous Driving with Real-World Data: Robust Data Augmentation, Models, and Evaluation [11.007092387379076]
We develop and train take-over time (TOT) models that operate on mid and high-level features produced by computer vision algorithms operating on different driver-facing camera views. We show that a TOT model supported by augmented data can be used to produce continuous estimates of take-over times without delay.
arXiv Detail & Related papers (2021-07-27T16:39:50Z)
Autonomous Vehicles that Alert Humans to Take-Over Controls: Modeling with Real-World Data [11.007092387379076]
This study focuses on the development of contextual, semantically meaningful representations of the driver state. We conduct a large-scale real-world controlled data study where participants are instructed to take-over control from an autonomous agent. These take-over events are captured using multiple driver-facing cameras, which when labelled result in a dataset of control transitions and their corresponding take-over times (TOTs) After augmenting this dataset, we develop and train TOT models that operate sequentially on low and mid-level features produced by computer vision algorithms operating on different driver-facing camera views.
arXiv Detail & Related papers (2021-04-23T09:16:53Z)
Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention. Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z)
A Driving Behavior Recognition Model with Bi-LSTM and Multi-Scale CNN [59.57221522897815]
We propose a neural network model based on trajectories information for driving behavior recognition. We evaluate the proposed model on the public BLVD dataset, achieving a satisfying performance.
arXiv Detail & Related papers (2021-03-01T06:47:29Z)
IntentNet: Learning to Predict Intention from Raw Sensor Data [86.74403297781039]
In this paper, we develop a one-stage detector and forecaster that exploits both 3D point clouds produced by a LiDAR sensor as well as dynamic maps of the environment. Our multi-task model achieves better accuracy than the respective separate modules while saving computation, which is critical to reducing reaction time in self-driving applications.
arXiv Detail & Related papers (2021-01-20T00:31:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.