Scaling Vision-based End-to-End Driving with Multi-View Attention
Learning
- URL: http://arxiv.org/abs/2302.03198v3
- Date: Sat, 22 Jul 2023 14:01:00 GMT
- Title: Scaling Vision-based End-to-End Driving with Multi-View Attention
Learning
- Authors: Yi Xiao, Felipe Codevilla, Diego Porres, Antonio M. Lopez
- Abstract summary: We present CIL++, which improves on CILRS by both processing higher-resolution images using a human-inspired HFOV as an inductive bias and incorporating a proper attention mechanism.
We propose to replace CILRS with CIL++ as a strong vision-based pure end-to-end driving baseline supervised by only vehicle signals and trained by conditional imitation learning.
- Score: 7.14967754486195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: On end-to-end driving, human driving demonstrations are used to train
perception-based driving models by imitation learning. This process is
supervised on vehicle signals (e.g., steering angle, acceleration) but does not
require extra costly supervision (human labeling of sensor data). As a
representative of such vision-based end-to-end driving models, CILRS is
commonly used as a baseline to compare with new driving models. So far, some
latest models achieve better performance than CILRS by using expensive sensor
suites and/or by using large amounts of human-labeled data for training. Given
the difference in performance, one may think that it is not worth pursuing
vision-based pure end-to-end driving. However, we argue that this approach
still has great value and potential considering cost and maintenance. In this
paper, we present CIL++, which improves on CILRS by both processing
higher-resolution images using a human-inspired HFOV as an inductive bias and
incorporating a proper attention mechanism. CIL++ achieves competitive
performance compared to models which are more costly to develop. We propose to
replace CILRS with CIL++ as a strong vision-based pure end-to-end driving
baseline supervised by only vehicle signals and trained by conditional
imitation learning.
Related papers
- MetaFollower: Adaptable Personalized Autonomous Car Following [63.90050686330677]
We propose an adaptable personalized car-following framework - MetaFollower.
We first utilize Model-Agnostic Meta-Learning (MAML) to extract common driving knowledge from various CF events.
We additionally combine Long Short-Term Memory (LSTM) and Intelligent Driver Model (IDM) to reflect temporal heterogeneity with high interpretability.
arXiv Detail & Related papers (2024-06-23T15:30:40Z) - Guiding Attention in End-to-End Driving Models [49.762868784033785]
Vision-based end-to-end driving models trained by imitation learning can lead to affordable solutions for autonomous driving.
We study how to guide the attention of these models to improve their driving quality by adding a loss term during training.
In contrast to previous work, our method does not require these salient semantic maps to be available during testing time.
arXiv Detail & Related papers (2024-04-30T23:18:51Z) - COOPERNAUT: End-to-End Driving with Cooperative Perception for Networked
Vehicles [54.61668577827041]
We introduce COOPERNAUT, an end-to-end learning model that uses cross-vehicle perception for vision-based cooperative driving.
Our experiments on AutoCastSim suggest that our cooperative perception driving models lead to a 40% improvement in average success rate.
arXiv Detail & Related papers (2022-05-04T17:55:12Z) - TransDARC: Transformer-based Driver Activity Recognition with Latent
Space Feature Calibration [31.908276711898548]
We present a vision-based framework for recognizing secondary driver behaviours based on visual transformers and an augmented feature distribution calibration module.
Our framework consistently leads to better recognition rates, surpassing previous state-of-the-art results of the public Drive&Act benchmark on all levels.
arXiv Detail & Related papers (2022-03-02T08:14:06Z) - Vision-Based Autonomous Car Racing Using Deep Imitative Reinforcement
Learning [13.699336307578488]
Deep imitative reinforcement learning approach (DIRL) achieves agile autonomous racing using visual inputs.
We validate our algorithm both in a high-fidelity driving simulation and on a real-world 1/20-scale RC-car with limited onboard computation.
arXiv Detail & Related papers (2021-07-18T00:00:48Z) - Self-Supervised Steering Angle Prediction for Vehicle Control Using
Visual Odometry [55.11913183006984]
We show how a model can be trained to control a vehicle's trajectory using camera poses estimated through visual odometry methods.
We propose a scalable framework that leverages trajectory information from several different runs using a camera setup placed at the front of a car.
arXiv Detail & Related papers (2021-03-20T16:29:01Z) - Action-Based Representation Learning for Autonomous Driving [8.296684637620551]
We propose to use action-based driving data for learning representations.
Our experiments show that an affordance-based driving model pre-trained with this approach can leverage a relatively small amount of weakly annotated imagery.
arXiv Detail & Related papers (2020-08-21T10:49:13Z) - Learning Accurate and Human-Like Driving using Semantic Maps and
Attention [152.48143666881418]
This paper investigates how end-to-end driving models can be improved to drive more accurately and human-like.
We exploit semantic and visual maps from HERE Technologies and augment the existing Drive360 dataset with such.
Our models are trained and evaluated on the Drive360 + HERE dataset, which features 60 hours and 3000 km of real-world driving data.
arXiv Detail & Related papers (2020-07-10T22:25:27Z) - Explaining Autonomous Driving by Learning End-to-End Visual Attention [25.09407072098823]
Current deep learning based autonomous driving approaches yield impressive results also leading to in-production deployment in certain controlled scenarios.
One of the most popular and fascinating approaches relies on learning vehicle controls directly from data perceived by sensors.
The main drawback of this approach as also in other learning problems is the lack of explainability. Indeed, a deep network will act as a black-box outputting predictions depending on previously seen driving patterns without giving any feedback on why such decisions were taken.
arXiv Detail & Related papers (2020-06-05T10:12:31Z) - Learning by Cheating [72.9701333689606]
We show that this challenging learning problem can be simplified by decomposing it into two stages.
We use the presented approach to train a vision-based autonomous driving system that substantially outperforms the state of the art.
Our approach achieves, for the first time, 100% success rate on all tasks in the original CARLA benchmark, sets a new record on the NoCrash benchmark, and reduces the frequency of infractions by an order of magnitude compared to the prior state of the art.
arXiv Detail & Related papers (2019-12-27T18:59:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.