Understanding and Modeling the Effects of Task and Context on Drivers' Gaze Allocation
- URL: http://arxiv.org/abs/2310.09275v3
- Date: Fri, 12 Apr 2024 18:10:51 GMT
- Title: Understanding and Modeling the Effects of Task and Context on Drivers' Gaze Allocation
- Authors: Iuliia Kotseruba, John K. Tsotsos,
- Abstract summary: We develop a novel model that modulates drivers' gaze prediction with explicit action and context information.
We correct the data processing pipeline used in DR(eye)VE to reduce noise in the recorded gaze data.
We benchmark a number of baseline and SOTA models for saliency and driver gaze prediction and use new annotations to analyze how their performance changes in scenarios involving different tasks.
- Score: 12.246649738388388
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To further advance driver monitoring and assistance systems, it is important to understand how drivers allocate their attention, in other words, where do they tend to look and why. Traditionally, factors affecting human visual attention have been divided into bottom-up (involuntary attraction to salient regions) and top-down (driven by the demands of the task being performed). Although both play a role in directing drivers' gaze, most of the existing models for drivers' gaze prediction apply techniques developed for bottom-up saliency and do not consider influences of the drivers' actions explicitly. Likewise, common driving attention benchmarks lack relevant annotations for drivers' actions and the context in which they are performed. Therefore, to enable analysis and modeling of these factors for drivers' gaze prediction, we propose the following: 1) we correct the data processing pipeline used in DR(eye)VE to reduce noise in the recorded gaze data; 2) we then add per-frame labels for driving task and context; 3) we benchmark a number of baseline and SOTA models for saliency and driver gaze prediction and use new annotations to analyze how their performance changes in scenarios involving different tasks; and, lastly, 4) we develop a novel model that modulates drivers' gaze prediction with explicit action and context information. While reducing noise in the DR(eye)VE gaze data improves results of all models, we show that using task information in our proposed model boosts performance even further compared to bottom-up models on the cleaned up data, both overall (by 24% KLD and 89% NSS) and on scenarios that involve performing safety-critical maneuvers and crossing intersections (by up to 10--30% KLD). Extended annotations and code are available at https://github.com/ykotseruba/SCOUT.
Related papers
- Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - Guiding Attention in End-to-End Driving Models [49.762868784033785]
Vision-based end-to-end driving models trained by imitation learning can lead to affordable solutions for autonomous driving.
We study how to guide the attention of these models to improve their driving quality by adding a loss term during training.
In contrast to previous work, our method does not require these salient semantic maps to be available during testing time.
arXiv Detail & Related papers (2024-04-30T23:18:51Z) - SCOUT+: Towards Practical Task-Driven Drivers' Gaze Prediction [12.246649738388388]
SCOUT+ is a task- and context-aware model for drivers' gaze prediction.
We evaluate our model on two datasets, DR(eye)VE and BDD-A.
arXiv Detail & Related papers (2024-04-12T18:29:10Z) - Data Limitations for Modeling Top-Down Effects on Drivers' Attention [12.246649738388388]
Driving is a visuomotor task, i.e., there is a connection between what drivers see and what they do.
Some models of drivers' gaze account for top-down effects of drivers' actions.
The majority learn only bottom-up correlations between human gaze and driving footage.
arXiv Detail & Related papers (2024-04-12T18:23:00Z) - G-MEMP: Gaze-Enhanced Multimodal Ego-Motion Prediction in Driving [71.9040410238973]
We focus on inferring the ego trajectory of a driver's vehicle using their gaze data.
Next, we develop G-MEMP, a novel multimodal ego-trajectory prediction network that combines GPS and video input with gaze data.
The results show that G-MEMP significantly outperforms state-of-the-art methods in both benchmarks.
arXiv Detail & Related papers (2023-12-13T23:06:30Z) - TOFG: A Unified and Fine-Grained Environment Representation in
Autonomous Driving [7.787762537147956]
In autonomous driving, an accurate understanding of environment plays a critical role in many driving tasks such as trajectory prediction and motion planning.
Many data-driven models for trajectory prediction and motion planning extract vehicle-to-vehicle and vehicle-to-lane interactions in a separate and sequential manner.
We propose an environment representation, Temporal Occupancy Flow Graph (TOFG), which unifies the map information and vehicle trajectories into a homogeneous data format.
arXiv Detail & Related papers (2023-05-31T17:43:56Z) - OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping [84.65114565766596]
We present OpenLane-V2, the first dataset on topology reasoning for traffic scene structure.
OpenLane-V2 consists of 2,000 annotated road scenes that describe traffic elements and their correlation to the lanes.
We evaluate various state-of-the-art methods, and present their quantitative and qualitative results on OpenLane-V2 to indicate future avenues for investigating topology reasoning in traffic scenes.
arXiv Detail & Related papers (2023-04-20T16:31:22Z) - FBLNet: FeedBack Loop Network for Driver Attention Prediction [75.83518507463226]
Nonobjective driving experience is difficult to model.
In this paper, we propose a FeedBack Loop Network (FBLNet) which attempts to model the driving experience accumulation procedure.
Under the guidance of the incremental knowledge, our model fuses the CNN feature and Transformer feature that are extracted from the input image to predict driver attention.
arXiv Detail & Related papers (2022-12-05T08:25:09Z) - Control-Aware Prediction Objectives for Autonomous Driving [78.19515972466063]
We present control-aware prediction objectives (CAPOs) to evaluate the downstream effect of predictions on control without requiring the planner be differentiable.
We propose two types of importance weights that weight the predictive likelihood: one using an attention model between agents, and another based on control variation when exchanging predicted trajectories for ground truth trajectories.
arXiv Detail & Related papers (2022-04-28T07:37:21Z) - Where and What: Driver Attention-based Object Detection [13.5947650184579]
We bridge the gap between pixel-level and object-level attention prediction.
Our framework achieves competitive state-of-the-art performance on both pixel-level and object-level.
arXiv Detail & Related papers (2022-04-26T08:38:22Z) - SCOUT: Socially-COnsistent and UndersTandable Graph Attention Network
for Trajectory Prediction of Vehicles and VRUs [0.0]
SCOUT is a novel Attention-based Graph Neural Network that uses a flexible and generic representation of the scene as a graph.
We explore three different attention mechanisms and test our scheme with both bird-eye-view and on-vehicle urban data.
We evaluate our model's flexibility and transferability by testing it under completely new scenarios on RounD dataset.
arXiv Detail & Related papers (2021-02-12T06:29:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.