MAAD: A Model and Dataset for "Attended Awareness" in Driving
- URL: http://arxiv.org/abs/2110.08610v1
- Date: Sat, 16 Oct 2021 16:36:10 GMT
- Title: MAAD: A Model and Dataset for "Attended Awareness" in Driving
- Authors: Deepak Gopinath, Guy Rosman, Simon Stent, Katsuya Terahata, Luke
Fletcher, Brenna Argall, John Leonard
- Abstract summary: We propose a model to estimate a person's attended awareness of their environment.
Our model takes as input scene information in the form of a video and noisy gaze estimates.
We capture a new dataset with a high-precision gaze tracker including 24.5 hours of gaze sequences from 23 subjects attending to videos of driving scenes.
- Score: 10.463152664328025
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a computational model to estimate a person's attended awareness of
their environment. We define attended awareness to be those parts of a
potentially dynamic scene which a person has attended to in recent history and
which they are still likely to be physically aware of. Our model takes as input
scene information in the form of a video and noisy gaze estimates, and outputs
visual saliency, a refined gaze estimate, and an estimate of the person's
attended awareness. In order to test our model, we capture a new dataset with a
high-precision gaze tracker including 24.5 hours of gaze sequences from 23
subjects attending to videos of driving scenes. The dataset also contains
third-party annotations of the subjects' attended awareness based on
observations of their scan path. Our results show that our model is able to
reasonably estimate attended awareness in a controlled setting, and in the
future could potentially be extended to real egocentric driving data to help
enable more effective ahead-of-time warnings in safety systems and thereby
augment driver performance. We also demonstrate our model's effectiveness on
the tasks of saliency, gaze calibration, and denoising, using both our dataset
and an existing saliency dataset. We make our model and dataset available at
https://github.com/ToyotaResearchInstitute/att-aware/.
Related papers
- VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions [10.748597086208145]
In this work, we propose a novel method that also incorporates visual input from surround-view cameras.
Our method achieves a latency of 53 ms, making it feasible for real-time processing.
Our experiments show that both the visual inputs and the textual descriptions contribute to improvements in trajectory prediction performance.
arXiv Detail & Related papers (2024-07-17T06:39:52Z) - Exploring the Evolution of Hidden Activations with Live-Update Visualization [12.377279207342735]
We introduce SentryCam, an automated, real-time visualization tool that reveals the progression of hidden representations during training.
Our results show that this visualization offers a more comprehensive view of the learning dynamics compared to basic metrics.
SentryCam could facilitate detailed analysis such as task transfer and catastrophic forgetting to a continual learning setting.
arXiv Detail & Related papers (2024-05-24T01:23:20Z) - Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - Data Limitations for Modeling Top-Down Effects on Drivers' Attention [12.246649738388388]
Driving is a visuomotor task, i.e., there is a connection between what drivers see and what they do.
Some models of drivers' gaze account for top-down effects of drivers' actions.
The majority learn only bottom-up correlations between human gaze and driving footage.
arXiv Detail & Related papers (2024-04-12T18:23:00Z) - Egocentric Scene-aware Human Trajectory Prediction [15.346096596482857]
We present a method to predict human motion conditioning on the surrounding static scene.
Our model outperforms existing methods on key metrics of collision avoidance and trajectory mode coverage.
arXiv Detail & Related papers (2024-03-27T21:43:12Z) - A Control-Centric Benchmark for Video Prediction [69.22614362800692]
We propose a benchmark for action-conditioned video prediction in the form of a control benchmark.
Our benchmark includes simulated environments with 11 task categories and 310 task instance definitions.
We then leverage our benchmark to study the effects of scaling model size, quantity of training data, and model ensembling.
arXiv Detail & Related papers (2023-04-26T17:59:45Z) - Reinforcement Learning from Passive Data via Latent Intentions [86.4969514480008]
We show that passive data can still be used to learn features that accelerate downstream RL.
Our approach learns from passive data by modeling intentions.
Our experiments demonstrate the ability to learn from many forms of passive data, including cross-embodiment video data and YouTube videos.
arXiv Detail & Related papers (2023-04-10T17:59:05Z) - Policy Pre-training for End-to-end Autonomous Driving via
Self-supervised Geometric Modeling [96.31941517446859]
We propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving.
We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos.
In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input.
In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only.
arXiv Detail & Related papers (2023-01-03T08:52:49Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Visual Distant Supervision for Scene Graph Generation [66.10579690929623]
Scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation.
We propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data.
Comprehensive experimental results show that our distantly supervised model outperforms strong weakly supervised and semi-supervised baselines.
arXiv Detail & Related papers (2021-03-29T06:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.