Data Limitations for Modeling Top-Down Effects on Drivers' Attention
- URL: http://arxiv.org/abs/2404.08749v1
- Date: Fri, 12 Apr 2024 18:23:00 GMT
- Title: Data Limitations for Modeling Top-Down Effects on Drivers' Attention
- Authors: Iuliia Kotseruba, John K. Tsotsos,
- Abstract summary: Driving is a visuomotor task, i.e., there is a connection between what drivers see and what they do.
Some models of drivers' gaze account for top-down effects of drivers' actions.
The majority learn only bottom-up correlations between human gaze and driving footage.
- Score: 12.246649738388388
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Driving is a visuomotor task, i.e., there is a connection between what drivers see and what they do. While some models of drivers' gaze account for top-down effects of drivers' actions, the majority learn only bottom-up correlations between human gaze and driving footage. The crux of the problem is lack of public data with annotations that could be used to train top-down models and evaluate how well models of any kind capture effects of task on attention. As a result, top-down models are trained and evaluated on private data and public benchmarks measure only the overall fit to human data. In this paper, we focus on data limitations by examining four large-scale public datasets, DR(eye)VE, BDD-A, MAAD, and LBW, used to train and evaluate algorithms for drivers' gaze prediction. We define a set of driving tasks (lateral and longitudinal maneuvers) and context elements (intersections and right-of-way) known to affect drivers' attention, augment the datasets with annotations based on the said definitions, and analyze the characteristics of data recording and processing pipelines w.r.t. capturing what the drivers see and do. In sum, the contributions of this work are: 1) quantifying biases of the public datasets, 2) examining performance of the SOTA bottom-up models on subsets of the data involving non-trivial drivers' actions, 3) linking shortcomings of the bottom-up models to data limitations, and 4) recommendations for future data collection and processing. The new annotations and code for reproducing the results is available at https://github.com/ykotseruba/SCOUT.
Related papers
- SCOUT+: Towards Practical Task-Driven Drivers' Gaze Prediction [12.246649738388388]
SCOUT+ is a task- and context-aware model for drivers' gaze prediction.
We evaluate our model on two datasets, DR(eye)VE and BDD-A.
arXiv Detail & Related papers (2024-04-12T18:29:10Z) - Situation Awareness for Driver-Centric Driving Style Adaptation [3.568617847600189]
We propose a situation-aware driving style model based on different visual feature encoders pretrained on fleet data.
Our experiments show that the proposed method outperforms static driving styles significantly and forms plausible situation clusters.
arXiv Detail & Related papers (2024-03-28T17:19:16Z) - G-MEMP: Gaze-Enhanced Multimodal Ego-Motion Prediction in Driving [71.9040410238973]
We focus on inferring the ego trajectory of a driver's vehicle using their gaze data.
Next, we develop G-MEMP, a novel multimodal ego-trajectory prediction network that combines GPS and video input with gaze data.
The results show that G-MEMP significantly outperforms state-of-the-art methods in both benchmarks.
arXiv Detail & Related papers (2023-12-13T23:06:30Z) - Understanding and Modeling the Effects of Task and Context on Drivers' Gaze Allocation [12.246649738388388]
We develop a novel model that modulates drivers' gaze prediction with explicit action and context information.
We correct the data processing pipeline used in DR(eye)VE to reduce noise in the recorded gaze data.
We benchmark a number of baseline and SOTA models for saliency and driver gaze prediction and use new annotations to analyze how their performance changes in scenarios involving different tasks.
arXiv Detail & Related papers (2023-10-13T17:38:41Z) - FBLNet: FeedBack Loop Network for Driver Attention Prediction [75.83518507463226]
Nonobjective driving experience is difficult to model.
In this paper, we propose a FeedBack Loop Network (FBLNet) which attempts to model the driving experience accumulation procedure.
Under the guidance of the incremental knowledge, our model fuses the CNN feature and Transformer feature that are extracted from the input image to predict driver attention.
arXiv Detail & Related papers (2022-12-05T08:25:09Z) - CoCAtt: A Cognitive-Conditioned Driver Attention Dataset (Supplementary
Material) [31.888206001447625]
Driver attention prediction can play an instrumental role in mitigating and preventing high-risk events.
We present a new driver attention dataset, CoCAtt.
CoCAtt is the largest and the most diverse driver attention dataset in terms of autonomy levels, eye tracker resolutions, and driving scenarios.
arXiv Detail & Related papers (2022-07-08T17:35:17Z) - Predicting Take-over Time for Autonomous Driving with Real-World Data:
Robust Data Augmentation, Models, and Evaluation [11.007092387379076]
We develop and train take-over time (TOT) models that operate on mid and high-level features produced by computer vision algorithms operating on different driver-facing camera views.
We show that a TOT model supported by augmented data can be used to produce continuous estimates of take-over times without delay.
arXiv Detail & Related papers (2021-07-27T16:39:50Z) - Injecting Knowledge in Data-driven Vehicle Trajectory Predictors [82.91398970736391]
Vehicle trajectory prediction tasks have been commonly tackled from two perspectives: knowledge-driven or data-driven.
In this paper, we propose to learn a "Realistic Residual Block" (RRB) which effectively connects these two perspectives.
Our proposed method outputs realistic predictions by confining the residual range and taking into account its uncertainty.
arXiv Detail & Related papers (2021-03-08T16:03:09Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - Learning Accurate and Human-Like Driving using Semantic Maps and
Attention [152.48143666881418]
This paper investigates how end-to-end driving models can be improved to drive more accurately and human-like.
We exploit semantic and visual maps from HERE Technologies and augment the existing Drive360 dataset with such.
Our models are trained and evaluated on the Drive360 + HERE dataset, which features 60 hours and 3000 km of real-world driving data.
arXiv Detail & Related papers (2020-07-10T22:25:27Z) - Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction [57.56466850377598]
Reasoning over visual data is a desirable capability for robotics and vision-based applications.
In this paper, we present a framework on graph to uncover relationships in different objects in the scene for reasoning about pedestrian intent.
Pedestrian intent, defined as the future action of crossing or not-crossing the street, is a very crucial piece of information for autonomous vehicles.
arXiv Detail & Related papers (2020-02-20T18:50:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.