Multi-modal Sensor Fusion for Auto Driving Perception: A Survey
- URL: http://arxiv.org/abs/2202.02703v1
- Date: Sun, 6 Feb 2022 04:18:45 GMT
- Title: Multi-modal Sensor Fusion for Auto Driving Perception: A Survey
- Authors: Keli Huang, Botian Shi, Xiang Li, Xin Li, Siyuan Huang, Yikang Li
- Abstract summary: We provide a literature review of the existing multi-modal-based methods for perception tasks in autonomous driving.
We propose an innovative way that divides them into two major classes, four minor classes by a more reasonable taxonomy in the view of the fusion stage.
- Score: 22.734013343067407
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-modal fusion is a fundamental task for the perception of an autonomous
driving system, which has recently intrigued many researchers. However,
achieving a rather good performance is not an easy task due to the noisy raw
data, underutilized information, and the misalignment of multi-modal sensors.
In this paper, we provide a literature review of the existing multi-modal-based
methods for perception tasks in autonomous driving. Generally, we make a
detailed analysis including over 50 papers leveraging perception sensors
including LiDAR and camera trying to solve object detection and semantic
segmentation tasks. Different from traditional fusion methodology for
categorizing fusion models, we propose an innovative way that divides them into
two major classes, four minor classes by a more reasonable taxonomy in the view
of the fusion stage. Moreover, we dive deep into the current fusion methods,
focusing on the remaining problems and open-up discussions on the potential
research opportunities. In conclusion, what we expect to do in this paper is to
present a new taxonomy of multi-modal fusion methods for the autonomous driving
perception tasks and provoke thoughts of the fusion-based techniques in the
future.
Related papers
- MMLF: Multi-modal Multi-class Late Fusion for Object Detection with Uncertainty Estimation [13.624431305114564]
This paper introduces a pioneering Multi-modal Multi-class Late Fusion method, designed for late fusion to enable multi-class detection.
Experiments conducted on the KITTI validation and official test datasets illustrate substantial performance improvements.
Our approach incorporates uncertainty analysis into the classification fusion process, rendering our model more transparent and trustworthy.
arXiv Detail & Related papers (2024-10-11T11:58:35Z) - M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for Autonomous Driving [11.36165122994834]
We propose a Multi-Modal fusion transformer incorporating Driver Attention (M2DA) for autonomous driving.
By incorporating driver attention, we empower the human-like scene understanding ability to autonomous vehicles to identify crucial areas precisely and ensure safety.
arXiv Detail & Related papers (2024-03-19T08:54:52Z) - Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives [56.2139730920855]
We present a systematic analysis of MM-VUFMs specifically designed for road scenes.
Our objective is to provide a comprehensive overview of common practices, referring to task-specific models, unified multi-modal models, unified multi-task models, and foundation model prompting techniques.
We provide insights into key challenges and future trends, such as closed-loop driving systems, interpretability, embodied driving agents, and world models.
arXiv Detail & Related papers (2024-02-05T12:47:09Z) - LLM4Drive: A Survey of Large Language Models for Autonomous Driving [62.10344445241105]
Large language models (LLMs) have demonstrated abilities including understanding context, logical reasoning, and generating answers.
In this paper, we systematically review a research line about textitLarge Language Models for Autonomous Driving (LLM4AD).
arXiv Detail & Related papers (2023-11-02T07:23:33Z) - Drive Anywhere: Generalizable End-to-end Autonomous Driving with
Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text.
Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z) - Visual Exemplar Driven Task-Prompting for Unified Perception in
Autonomous Driving [100.3848723827869]
We present an effective multi-task framework, VE-Prompt, which introduces visual exemplars via task-specific prompting.
Specifically, we generate visual exemplars based on bounding boxes and color-based markers, which provide accurate visual appearances of target categories.
We bridge transformer-based encoders and convolutional layers for efficient and accurate unified perception in autonomous driving.
arXiv Detail & Related papers (2023-03-03T08:54:06Z) - Multi-modal Fusion Technology based on Vehicle Information: A Survey [0.7646713951724012]
The current multi-modal fusion methods mainly focus on camera data and LiDAR data, but pay little attention to the kinematic information provided by the bottom sensors of the vehicle.
These information are not affected by complex external scenes, so it is more robust and reliable.
New future ideas of multi-modal fusion technology for autonomous driving tasks are proposed to promote the further utilization of vehicle bottom information.
arXiv Detail & Related papers (2022-11-11T09:25:53Z) - Exploring Contextual Representation and Multi-Modality for End-to-End
Autonomous Driving [58.879758550901364]
Recent perception systems enhance spatial understanding with sensor fusion but often lack full environmental context.
We introduce a framework that integrates three cameras to emulate the human field of view, coupled with top-down bird-eye-view semantic data to enhance contextual representation.
Our method achieves displacement error by 0.67m in open-loop settings, surpassing current methods by 6.9% on the nuScenes dataset.
arXiv Detail & Related papers (2022-10-13T05:56:20Z) - A Comparative Analysis of Decision-Level Fusion for Multimodal Driver
Behaviour Understanding [22.405229530620414]
This paper presents an empirical evaluation of different paradigms for decision-level late fusion in video-based driver observation.
We compare seven different mechanisms for joining the results of single-modal classifiers.
This is the first systematic study of strategies for fusing outcomes of multimodal predictors inside the vehicles.
arXiv Detail & Related papers (2022-04-10T17:49:22Z) - Multimodal Object Detection via Bayesian Fusion [59.31437166291557]
We study multimodal object detection with RGB and thermal cameras, since the latter can provide much stronger object signatures under poor illumination.
Our key contribution is a non-learned late-fusion method that fuses together bounding box detections from different modalities.
We apply our approach to benchmarks containing both aligned (KAIST) and unaligned (FLIR) multimodal sensor data.
arXiv Detail & Related papers (2021-04-07T04:03:20Z) - Multi-modal Experts Network for Autonomous Driving [16.587968446342995]
End-to-end learning from sensory data has shown promising results in autonomous driving.
It is challenging to train and deploy such network and at least two problems are encountered in the considered setting.
We propose a novel, carefully tailored multi-modal experts network architecture and propose a multi-stage training procedure.
arXiv Detail & Related papers (2020-09-18T14:54:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.