Multi-modal Experts Network for Autonomous Driving
- URL: http://arxiv.org/abs/2009.08876v1
- Date: Fri, 18 Sep 2020 14:54:54 GMT
- Title: Multi-modal Experts Network for Autonomous Driving
- Authors: Shihong Fang, Anna Choromanska
- Abstract summary: End-to-end learning from sensory data has shown promising results in autonomous driving.
It is challenging to train and deploy such network and at least two problems are encountered in the considered setting.
We propose a novel, carefully tailored multi-modal experts network architecture and propose a multi-stage training procedure.
- Score: 16.587968446342995
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: End-to-end learning from sensory data has shown promising results in
autonomous driving. While employing many sensors enhances world perception and
should lead to more robust and reliable behavior of autonomous vehicles, it is
challenging to train and deploy such network and at least two problems are
encountered in the considered setting. The first one is the increase of
computational complexity with the number of sensing devices. The other is the
phenomena of network overfitting to the simplest and most informative input. We
address both challenges with a novel, carefully tailored multi-modal experts
network architecture and propose a multi-stage training procedure. The network
contains a gating mechanism, which selects the most relevant input at each
inference time step using a mixed discrete-continuous policy. We demonstrate
the plausibility of the proposed approach on our 1/6 scale truck equipped with
three cameras and one LiDAR.
Related papers
- DeepInteraction++: Multi-Modality Interaction for Autonomous Driving [80.8837864849534]
We introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout.
DeepInteraction++ is a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder.
Experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.
arXiv Detail & Related papers (2024-08-09T14:04:21Z) - M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for Autonomous Driving [11.36165122994834]
We propose a Multi-Modal fusion transformer incorporating Driver Attention (M2DA) for autonomous driving.
By incorporating driver attention, we empower the human-like scene understanding ability to autonomous vehicles to identify crucial areas precisely and ensure safety.
arXiv Detail & Related papers (2024-03-19T08:54:52Z) - Drive Anywhere: Generalizable End-to-end Autonomous Driving with
Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text.
Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z) - End-to-end Autonomous Driving: Challenges and Frontiers [45.391430626264764]
We provide a comprehensive analysis of more than 270 papers, covering the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving.
We delve into several critical challenges, including multi-modality, interpretability, causal confusion, robustness, and world models, amongst others.
We discuss current advancements in foundation models and visual pre-training, as well as how to incorporate these techniques within the end-to-end driving framework.
arXiv Detail & Related papers (2023-06-29T14:17:24Z) - Federated Deep Learning Meets Autonomous Vehicle Perception: Design and
Verification [168.67190934250868]
Federated learning empowered connected autonomous vehicle (FLCAV) has been proposed.
FLCAV preserves privacy while reducing communication and annotation costs.
It is challenging to determine the network resources and road sensor poses for multi-stage training.
arXiv Detail & Related papers (2022-06-03T23:55:45Z) - High Efficiency Pedestrian Crossing Prediction [0.0]
State-of-the-art methods in predicting pedestrian crossing intention often rely on multiple streams of information as inputs.
We introduce a network with only frames of pedestrians as the input.
Experiments validate that our model consistently delivers outstanding performances.
arXiv Detail & Related papers (2022-04-04T21:37:57Z) - End-to-End Intersection Handling using Multi-Agent Deep Reinforcement
Learning [63.56464608571663]
Navigating through intersections is one of the main challenging tasks for an autonomous vehicle.
In this work, we focus on the implementation of a system able to navigate through intersections where only traffic signs are provided.
We propose a multi-agent system using a continuous, model-free Deep Reinforcement Learning algorithm used to train a neural network for predicting both the acceleration and the steering angle at each time step.
arXiv Detail & Related papers (2021-04-28T07:54:40Z) - IntentNet: Learning to Predict Intention from Raw Sensor Data [86.74403297781039]
In this paper, we develop a one-stage detector and forecaster that exploits both 3D point clouds produced by a LiDAR sensor as well as dynamic maps of the environment.
Our multi-task model achieves better accuracy than the respective separate modules while saving computation, which is critical to reducing reaction time in self-driving applications.
arXiv Detail & Related papers (2021-01-20T00:31:52Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.