Related papers: Where, What, Why: Towards Explainable Driver Attention Prediction

Where, What, Why: Towards Explainable Driver Attention Prediction

URL: http://arxiv.org/abs/2506.23088v1
Date: Sun, 29 Jun 2025 04:59:39 GMT
Title: Where, What, Why: Towards Explainable Driver Attention Prediction
Authors: Yuchen Zhou, Jiayu Tang, Xiaoyan Xiao, Yueyao Lin, Linkai Liu, Zipeng Guo, Hao Fei, Xiaobo Xia, Chao Gou,
Abstract summary: We introduce Explainable Driver Attention Prediction, a novel task paradigm that jointly predicts spatial attention regions (where), parses attended semantics (what), and provides cognitive reasoning for attention allocation (why)<n>We propose LLada, a Large Language model-driven framework for driver attention prediction, which unifies pixel modeling, semantic parsing, and cognitive reasoning within an end-to-end architecture.<n>This work serves as a key step toward a deeper understanding of driver attention mechanisms, with significant implications for autonomous driving, intelligent driver training, and human-computer interaction.
Score: 28.677786362573638
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modeling task-driven attention in driving is a fundamental challenge for both autonomous vehicles and cognitive science. Existing methods primarily predict where drivers look by generating spatial heatmaps, but fail to capture the cognitive motivations behind attention allocation in specific contexts, which limits deeper understanding of attention mechanisms. To bridge this gap, we introduce Explainable Driver Attention Prediction, a novel task paradigm that jointly predicts spatial attention regions (where), parses attended semantics (what), and provides cognitive reasoning for attention allocation (why). To support this, we present W3DA, the first large-scale explainable driver attention dataset. It enriches existing benchmarks with detailed semantic and causal annotations across diverse driving scenarios, including normal conditions, safety-critical situations, and traffic accidents. We further propose LLada, a Large Language model-driven framework for driver attention prediction, which unifies pixel modeling, semantic parsing, and cognitive reasoning within an end-to-end architecture. Extensive experiments demonstrate the effectiveness of LLada, exhibiting robust generalization across datasets and driving conditions. This work serves as a key step toward a deeper understanding of driver attention mechanisms, with significant implications for autonomous driving, intelligent driver training, and human-computer interaction.

Related papers

Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction [69.29802752614677]
RouteFormer is a novel ego-trajectory prediction network combining GPS data, environmental context, and the driver's field-of-view.<n>To tackle data scarcity and enhance diversity, we introduce GEM, a dataset of urban driving scenarios enriched with synchronized driver field-of-view and gaze data.
arXiv Detail & Related papers (2023-12-13T23:06:30Z)
Cognitive Accident Prediction in Driving Scenes: A Multimodality Benchmark [77.54411007883962]
We propose a Cognitive Accident Prediction (CAP) method that explicitly leverages human-inspired cognition of text description on the visual observation and the driver attention to facilitate model training. CAP is formulated by an attentive text-to-vision shift fusion module, an attentive scene context transfer module, and the driver attention guided accident prediction module. We construct a new large-scale benchmark consisting of 11,727 in-the-wild accident videos with over 2.19 million frames.
arXiv Detail & Related papers (2022-12-19T11:43:02Z)
FBLNet: FeedBack Loop Network for Driver Attention Prediction [50.936478241688114]
Nonobjective driving experience is difficult to model, so a mechanism simulating the driver experience accumulation procedure is absent in existing methods.<n>We propose a FeedBack Loop Network (FBLNet), which attempts to model the driving experience accumulation procedure.<n>Our model exhibits a solid advantage over existing methods, achieving an outstanding performance improvement on two driver attention benchmark datasets.
arXiv Detail & Related papers (2022-12-05T08:25:09Z)
CoCAtt: A Cognitive-Conditioned Driver Attention Dataset (Supplementary Material) [31.888206001447625]
Driver attention prediction can play an instrumental role in mitigating and preventing high-risk events. We present a new driver attention dataset, CoCAtt. CoCAtt is the largest and the most diverse driver attention dataset in terms of autonomy levels, eye tracker resolutions, and driving scenarios.
arXiv Detail & Related papers (2022-07-08T17:35:17Z)
Where and What: Driver Attention-based Object Detection [13.5947650184579]
We bridge the gap between pixel-level and object-level attention prediction. Our framework achieves competitive state-of-the-art performance on both pixel-level and object-level.
arXiv Detail & Related papers (2022-04-26T08:38:22Z)
CoCAtt: A Cognitive-Conditioned Driver Attention Dataset [16.177399201198636]
Driver attention prediction can play an instrumental role in mitigating and preventing high-risk events. We present a new driver attention dataset, CoCAtt. CoCAtt is the largest and the most diverse driver attention dataset in terms of autonomy levels, eye tracker resolutions, and driving scenarios.
arXiv Detail & Related papers (2021-11-19T02:42:34Z)
Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving [104.32241082170044]
We study a new task, safety-aware motion prediction with unseen vehicles for autonomous driving. Unlike the existing trajectory prediction task for seen vehicles, we aim at predicting an occupancy map. Our approach is the first one that can predict the existence of unseen vehicles in most cases.
arXiv Detail & Related papers (2021-09-03T13:33:33Z)
Perceive, Attend, and Drive: Learning Spatial Attention for Safe Self-Driving [84.59201486239908]
We propose an end-to-end self-driving network featuring a sparse attention module that learns to automatically attend to important regions of the input. The attention module specifically targets motion planning, whereas prior literature only applied attention in perception tasks.
arXiv Detail & Related papers (2020-11-02T17:47:54Z)
Studying Person-Specific Pointing and Gaze Behavior for Multimodal Referencing of Outside Objects from a Moving Vehicle [58.720142291102135]
Hand pointing and eye gaze have been extensively investigated in automotive applications for object selection and referencing. Existing outside-the-vehicle referencing methods focus on a static situation, whereas the situation in a moving vehicle is highly dynamic and subject to safety-critical constraints. We investigate the specific characteristics of each modality and the interaction between them when used in the task of referencing outside objects.
arXiv Detail & Related papers (2020-09-23T14:56:19Z)
Explaining Autonomous Driving by Learning End-to-End Visual Attention [25.09407072098823]
Current deep learning based autonomous driving approaches yield impressive results also leading to in-production deployment in certain controlled scenarios. One of the most popular and fascinating approaches relies on learning vehicle controls directly from data perceived by sensors. The main drawback of this approach as also in other learning problems is the lack of explainability. Indeed, a deep network will act as a black-box outputting predictions depending on previously seen driving patterns without giving any feedback on why such decisions were taken.
arXiv Detail & Related papers (2020-06-05T10:12:31Z)
When Do Drivers Concentrate? Attention-based Driver Behavior Modeling With Deep Reinforcement Learning [8.9801312307912]
We propose an actor-critic method to approximate a driver' s action according to observations and measure the driver' s attention allocation. Considering reaction time, we construct the attention mechanism in the actor network to capture temporal dependencies of consecutive observations. We conduct experiments on real-world vehicle trajectory datasets and show that the accuracy of our proposed approach outperforms seven baseline algorithms.
arXiv Detail & Related papers (2020-02-26T09:56:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.