CEMFormer: Learning to Predict Driver Intentions from In-Cabin and
External Cameras via Spatial-Temporal Transformers
- URL: http://arxiv.org/abs/2305.07840v1
- Date: Sat, 13 May 2023 05:27:36 GMT
- Title: CEMFormer: Learning to Predict Driver Intentions from In-Cabin and
External Cameras via Spatial-Temporal Transformers
- Authors: Yunsheng Ma, Wenqian Ye, Xu Cao, Amr Abdelraouf, Kyungtae Han, Rohit
Gupta, Ziran Wang
- Abstract summary: We introduce a new framework called Cross-View Episodic Memory Transformer (CEM)
CEM employs unified memory representations to learn for an improved driver intention prediction.
We propose a novel context-consistency loss that incorporates driving context as an auxiliary supervision signal to improve prediction performance.
- Score: 5.572431452586636
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Driver intention prediction seeks to anticipate drivers' actions by analyzing
their behaviors with respect to surrounding traffic environments. Existing
approaches primarily focus on late-fusion techniques, and neglect the
importance of maintaining consistency between predictions and prevailing
driving contexts. In this paper, we introduce a new framework called Cross-View
Episodic Memory Transformer (CEMFormer), which employs spatio-temporal
transformers to learn unified memory representations for an improved driver
intention prediction. Specifically, we develop a spatial-temporal encoder to
integrate information from both in-cabin and external camera views, along with
episodic memory representations to continuously fuse historical data.
Furthermore, we propose a novel context-consistency loss that incorporates
driving context as an auxiliary supervision signal to improve prediction
performance. Comprehensive experiments on the Brain4Cars dataset demonstrate
that CEMFormer consistently outperforms existing state-of-the-art methods in
driver intention prediction.
Related papers
- AHMF: Adaptive Hybrid-Memory-Fusion Model for Driver Attention Prediction [14.609639142688035]
This paper proposes an Adaptive Hybrid-Memory-Fusion (AHMF) driver attention prediction model to achieve more human-like predictions.
The model first encodes information about specific hazardous stimuli in the current scene to form working memories. Then, it adaptively retrieves similar situational experiences from the long-term memory for final prediction.
arXiv Detail & Related papers (2024-07-24T17:19:58Z) - Crossfusor: A Cross-Attention Transformer Enhanced Conditional Diffusion Model for Car-Following Trajectory Prediction [10.814758830775727]
This study introduces a Cross-Attention Transformer Enhanced Diffusion Model (Crossfusor) specifically designed for car-following trajectory prediction.
It integrates detailed inter-vehicular interactions and car-following dynamics into a robust diffusion framework, improving both the accuracy and realism of predicted trajectories.
Experimental results on the NGSIM dataset demonstrate that Crossfusor outperforms state-of-the-art models, particularly in long-term predictions.
arXiv Detail & Related papers (2024-06-17T17:35:47Z) - Looking Inside Out: Anticipating Driver Intent From Videos [20.501288763809036]
Driver intention can be leveraged to improve road safety, such as warning surrounding vehicles in the event the driver is attempting a dangerous maneuver.
We propose a novel method of utilizing in-cabin and external camera data to improve state-of-the-art (SOTA) performance in predicting future driver actions.
Our models predict driver maneuvers more accurately and earlier than existing approaches, with an accuracy of 87.5% and an average prediction time of 4.35 seconds before the maneuver takes place.
arXiv Detail & Related papers (2023-12-03T16:24:50Z) - Layout Sequence Prediction From Noisy Mobile Modality [53.49649231056857]
Trajectory prediction plays a vital role in understanding pedestrian movement for applications such as autonomous driving and robotics.
Current trajectory prediction models depend on long, complete, and accurately observed sequences from visual modalities.
We propose LTrajDiff, a novel approach that treats objects obstructed or out of sight as equally important as those with fully visible trajectories.
arXiv Detail & Related papers (2023-10-09T20:32:49Z) - Cognitive Accident Prediction in Driving Scenes: A Multimodality
Benchmark [77.54411007883962]
We propose a Cognitive Accident Prediction (CAP) method that explicitly leverages human-inspired cognition of text description on the visual observation and the driver attention to facilitate model training.
CAP is formulated by an attentive text-to-vision shift fusion module, an attentive scene context transfer module, and the driver attention guided accident prediction module.
We construct a new large-scale benchmark consisting of 11,727 in-the-wild accident videos with over 2.19 million frames.
arXiv Detail & Related papers (2022-12-19T11:43:02Z) - FBLNet: FeedBack Loop Network for Driver Attention Prediction [75.83518507463226]
Nonobjective driving experience is difficult to model.
In this paper, we propose a FeedBack Loop Network (FBLNet) which attempts to model the driving experience accumulation procedure.
Under the guidance of the incremental knowledge, our model fuses the CNN feature and Transformer feature that are extracted from the input image to predict driver attention.
arXiv Detail & Related papers (2022-12-05T08:25:09Z) - AdvDO: Realistic Adversarial Attacks for Trajectory Prediction [87.96767885419423]
Trajectory prediction is essential for autonomous vehicles to plan correct and safe driving behaviors.
We devise an optimization-based adversarial attack framework to generate realistic adversarial trajectories.
Our attack can lead an AV to drive off road or collide into other vehicles in simulation.
arXiv Detail & Related papers (2022-09-19T03:34:59Z) - DRIVE: Deep Reinforced Accident Anticipation with Visual Explanation [36.350348194248014]
Traffic accident anticipation aims to accurately and promptly predict the occurrence of a future accident from dashcam videos.
Existing approaches typically focus on capturing the cues of spatial and temporal context before a future accident occurs.
We propose Deep ReInforced accident anticipation with Visual Explanation, named DRIVE.
arXiv Detail & Related papers (2021-07-21T16:33:21Z) - End-to-end Contextual Perception and Prediction with Interaction
Transformer [79.14001602890417]
We tackle the problem of detecting objects in 3D and forecasting their future motion in the context of self-driving.
To capture their spatial-temporal dependencies, we propose a recurrent neural network with a novel Transformer architecture.
Our model can be trained end-to-end, and runs in real-time.
arXiv Detail & Related papers (2020-08-13T14:30:12Z) - Implicit Latent Variable Model for Scene-Consistent Motion Forecasting [78.74510891099395]
In this paper, we aim to learn scene-consistent motion forecasts of complex urban traffic directly from sensor data.
We model the scene as an interaction graph and employ powerful graph neural networks to learn a distributed latent representation of the scene.
arXiv Detail & Related papers (2020-07-23T14:31:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.