SCOUT+: Towards Practical Task-Driven Drivers' Gaze Prediction
- URL: http://arxiv.org/abs/2404.08756v1
- Date: Fri, 12 Apr 2024 18:29:10 GMT
- Title: SCOUT+: Towards Practical Task-Driven Drivers' Gaze Prediction
- Authors: Iuliia Kotseruba, John K. Tsotsos,
- Abstract summary: SCOUT+ is a task- and context-aware model for drivers' gaze prediction.
We evaluate our model on two datasets, DR(eye)VE and BDD-A.
- Score: 12.246649738388388
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate prediction of drivers' gaze is an important component of vision-based driver monitoring and assistive systems. Of particular interest are safety-critical episodes, such as performing maneuvers or crossing intersections. In such scenarios, drivers' gaze distribution changes significantly and becomes difficult to predict, especially if the task and context information is represented implicitly, as is common in many state-of-the-art models. However, explicit modeling of top-down factors affecting drivers' attention often requires additional information and annotations that may not be readily available. In this paper, we address the challenge of effective modeling of task and context with common sources of data for use in practical systems. To this end, we introduce SCOUT+, a task- and context-aware model for drivers' gaze prediction, which leverages route and map information inferred from commonly available GPS data. We evaluate our model on two datasets, DR(eye)VE and BDD-A, and demonstrate that using maps improves results compared to bottom-up models and reaches performance comparable to the top-down model SCOUT which relies on privileged ground truth information. Code is available at https://github.com/ykotseruba/SCOUT.
Related papers
- DeTra: A Unified Model for Object Detection and Trajectory Forecasting [68.85128937305697]
Our approach formulates the union of the two tasks as a trajectory refinement problem.
To tackle this unified task, we design a refinement transformer that infers the presence, pose, and multi-modal future behaviors of objects.
In our experiments, we observe that ourmodel outperforms the state-of-the-art on Argoverse 2 Sensor and Open dataset.
arXiv Detail & Related papers (2024-06-06T18:12:04Z) - Guiding Attention in End-to-End Driving Models [49.762868784033785]
Vision-based end-to-end driving models trained by imitation learning can lead to affordable solutions for autonomous driving.
We study how to guide the attention of these models to improve their driving quality by adding a loss term during training.
In contrast to previous work, our method does not require these salient semantic maps to be available during testing time.
arXiv Detail & Related papers (2024-04-30T23:18:51Z) - Data Limitations for Modeling Top-Down Effects on Drivers' Attention [12.246649738388388]
Driving is a visuomotor task, i.e., there is a connection between what drivers see and what they do.
Some models of drivers' gaze account for top-down effects of drivers' actions.
The majority learn only bottom-up correlations between human gaze and driving footage.
arXiv Detail & Related papers (2024-04-12T18:23:00Z) - G-MEMP: Gaze-Enhanced Multimodal Ego-Motion Prediction in Driving [71.9040410238973]
We focus on inferring the ego trajectory of a driver's vehicle using their gaze data.
Next, we develop G-MEMP, a novel multimodal ego-trajectory prediction network that combines GPS and video input with gaze data.
The results show that G-MEMP significantly outperforms state-of-the-art methods in both benchmarks.
arXiv Detail & Related papers (2023-12-13T23:06:30Z) - Understanding and Modeling the Effects of Task and Context on Drivers' Gaze Allocation [12.246649738388388]
We develop a novel model that modulates drivers' gaze prediction with explicit action and context information.
We correct the data processing pipeline used in DR(eye)VE to reduce noise in the recorded gaze data.
We benchmark a number of baseline and SOTA models for saliency and driver gaze prediction and use new annotations to analyze how their performance changes in scenarios involving different tasks.
arXiv Detail & Related papers (2023-10-13T17:38:41Z) - FBLNet: FeedBack Loop Network for Driver Attention Prediction [75.83518507463226]
Nonobjective driving experience is difficult to model.
In this paper, we propose a FeedBack Loop Network (FBLNet) which attempts to model the driving experience accumulation procedure.
Under the guidance of the incremental knowledge, our model fuses the CNN feature and Transformer feature that are extracted from the input image to predict driver attention.
arXiv Detail & Related papers (2022-12-05T08:25:09Z) - Exploring Map-based Features for Efficient Attention-based Vehicle
Motion Prediction [3.222802562733787]
Motion prediction of multiple agents is a crucial task in arbitrarily complex environments.
We show how to achieve competitive performance on the Argoverse 1.0 Benchmark using efficient attention-based models.
arXiv Detail & Related papers (2022-05-25T22:38:11Z) - Predicting Take-over Time for Autonomous Driving with Real-World Data:
Robust Data Augmentation, Models, and Evaluation [11.007092387379076]
We develop and train take-over time (TOT) models that operate on mid and high-level features produced by computer vision algorithms operating on different driver-facing camera views.
We show that a TOT model supported by augmented data can be used to produce continuous estimates of take-over times without delay.
arXiv Detail & Related papers (2021-07-27T16:39:50Z) - Injecting Knowledge in Data-driven Vehicle Trajectory Predictors [82.91398970736391]
Vehicle trajectory prediction tasks have been commonly tackled from two perspectives: knowledge-driven or data-driven.
In this paper, we propose to learn a "Realistic Residual Block" (RRB) which effectively connects these two perspectives.
Our proposed method outputs realistic predictions by confining the residual range and taking into account its uncertainty.
arXiv Detail & Related papers (2021-03-08T16:03:09Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - GISNet: Graph-Based Information Sharing Network For Vehicle Trajectory
Prediction [6.12727713172576]
Many AI-oriented companies, such as Google, Uber and DiDi, are investigating more accurate vehicle trajectory prediction algorithms.
In this paper, we propose a novel graph-based information sharing network (GISNet) that allows the information sharing between the target vehicle and its surrounding vehicles.
arXiv Detail & Related papers (2020-03-22T03:24:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.