Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion
Transformer
- URL: http://arxiv.org/abs/2207.14024v2
- Date: Fri, 29 Jul 2022 12:42:16 GMT
- Title: Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion
Transformer
- Authors: Hao Shao, Letian Wang, RuoBing Chen, Hongsheng Li, Yu Liu
- Abstract summary: We propose a safety-enhanced autonomous driving framework, named Interpretable Sensor Fusion Transformer(InterFuser)
We process and fuse information from multi-modal multi-view sensors for achieving comprehensive scene understanding and adversarial event detection.
Our framework provides more semantics and are exploited to better constrain actions to be within the safe sets.
- Score: 28.15612357340141
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale deployment of autonomous vehicles has been continually delayed
due to safety concerns. On the one hand, comprehensive scene understanding is
indispensable, a lack of which would result in vulnerability to rare but
complex traffic situations, such as the sudden emergence of unknown objects.
However, reasoning from a global context requires access to sensors of multiple
types and adequate fusion of multi-modal sensor signals, which is difficult to
achieve. On the other hand, the lack of interpretability in learning models
also hampers the safety with unverifiable failure causes. In this paper, we
propose a safety-enhanced autonomous driving framework, named Interpretable
Sensor Fusion Transformer(InterFuser), to fully process and fuse information
from multi-modal multi-view sensors for achieving comprehensive scene
understanding and adversarial event detection. Besides, intermediate
interpretable features are generated from our framework, which provide more
semantics and are exploited to better constrain actions to be within the safe
sets. We conducted extensive experiments on CARLA benchmarks, where our model
outperforms prior methods, ranking the first on the public CARLA Leaderboard.
Our code will be made available at https://github.com/opendilab/InterFuser
Related papers
- Graph-Based Multi-Modal Sensor Fusion for Autonomous Driving [3.770103075126785]
We introduce a novel approach to multi-modal sensor fusion, focusing on developing a graph-based state representation.
We present a Sensor-Agnostic Graph-Aware Kalman Filter, the first online state estimation technique designed to fuse multi-modal graphs.
We validate the effectiveness of our proposed framework through extensive experiments conducted on both synthetic and real-world driving datasets.
arXiv Detail & Related papers (2024-11-06T06:58:17Z) - Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes [56.52618054240197]
We propose a novel, condition-aware multimodal fusion approach for robust semantic perception of driving scenes.
Our method, CAFuser, uses an RGB camera input to classify environmental conditions and generate a Condition Token that guides the fusion of multiple sensor modalities.
We set the new state of the art with CAFuser on the MUSES dataset with 59.7 PQ for multimodal panoptic segmentation and 78.2 mIoU for semantic segmentation, ranking first on the public benchmarks.
arXiv Detail & Related papers (2024-10-14T17:56:20Z) - M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for Autonomous Driving [11.36165122994834]
We propose a Multi-Modal fusion transformer incorporating Driver Attention (M2DA) for autonomous driving.
By incorporating driver attention, we empower the human-like scene understanding ability to autonomous vehicles to identify crucial areas precisely and ensure safety.
arXiv Detail & Related papers (2024-03-19T08:54:52Z) - SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries [94.84458417662407]
We introduce SAFE-SIM, a controllable closed-loop safety-critical simulation framework.
Our approach yields two distinct advantages: 1) generating realistic long-tail safety-critical scenarios that closely reflect real-world conditions, and 2) providing controllable adversarial behavior for more comprehensive and interactive evaluations.
We validate our framework empirically using the nuScenes and nuPlan datasets across multiple planners, demonstrating improvements in both realism and controllability.
arXiv Detail & Related papers (2023-12-31T04:14:43Z) - Drive Anywhere: Generalizable End-to-end Autonomous Driving with
Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text.
Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z) - Cognitive TransFuser: Semantics-guided Transformer-based Sensor Fusion
for Improved Waypoint Prediction [38.971222477695214]
RGB-LIDAR-based multi-task feature fusion network, coined Cognitive TransFuser, augments and exceeds the baseline network by a significant margin for safer and more complete road navigation.
We validate the proposed network on the Town05 Short and Town05 Long Benchmark through extensive experiments, achieving up to 44.2 FPS real-time inference time.
arXiv Detail & Related papers (2023-08-04T03:59:10Z) - AutoFed: Heterogeneity-Aware Federated Multimodal Learning for Robust
Autonomous Driving [15.486799633600423]
AutoFed is a framework to fully exploit multimodal sensory data on autonomous vehicles.
We propose a novel model leveraging pseudo-labeling to avoid mistakenly treating unlabeled objects as the background.
We also propose an autoencoder-based data imputation method to fill missing data modality.
arXiv Detail & Related papers (2023-02-17T01:31:53Z) - Exploring Contextual Representation and Multi-Modality for End-to-End
Autonomous Driving [58.879758550901364]
Recent perception systems enhance spatial understanding with sensor fusion but often lack full environmental context.
We introduce a framework that integrates three cameras to emulate the human field of view, coupled with top-down bird-eye-view semantic data to enhance contextual representation.
Our method achieves displacement error by 0.67m in open-loop settings, surpassing current methods by 6.9% on the nuScenes dataset.
arXiv Detail & Related papers (2022-10-13T05:56:20Z) - Detecting Safety Problems of Multi-Sensor Fusion in Autonomous Driving [18.39664775350204]
Multi-sensor fusion (MSF) is used to fuse the sensor inputs and produce a more reliable understanding of the surroundings.
MSF methods in an industry-grade Advanced Driver-Assistance System (ADAS) can mislead the car control and result in serious safety hazards.
We develop a novel evolutionary-based domain-specific search framework, FusionFuzz, for the efficient detection of fusion errors.
arXiv Detail & Related papers (2021-09-14T02:35:34Z) - Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention.
Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z) - Learning Selective Sensor Fusion for States Estimation [47.76590539558037]
We propose SelectFusion, an end-to-end selective sensor fusion module.
During prediction, the network is able to assess the reliability of the latent features from different sensor modalities.
We extensively evaluate all fusion strategies in both public datasets and on progressively degraded datasets.
arXiv Detail & Related papers (2019-12-30T20:25:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.