Related papers: MAAD: A Model and Dataset for "Attended Awareness" in Driving

MAAD: A Model and Dataset for "Attended Awareness" in Driving

URL: http://arxiv.org/abs/2110.08610v1
Date: Sat, 16 Oct 2021 16:36:10 GMT
Title: MAAD: A Model and Dataset for "Attended Awareness" in Driving
Authors: Deepak Gopinath, Guy Rosman, Simon Stent, Katsuya Terahata, Luke Fletcher, Brenna Argall, John Leonard
Abstract summary: We propose a model to estimate a person's attended awareness of their environment. Our model takes as input scene information in the form of a video and noisy gaze estimates. We capture a new dataset with a high-precision gaze tracker including 24.5 hours of gaze sequences from 23 subjects attending to videos of driving scenes.
Score: 10.463152664328025
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a computational model to estimate a person's attended awareness of their environment. We define attended awareness to be those parts of a potentially dynamic scene which a person has attended to in recent history and which they are still likely to be physically aware of. Our model takes as input scene information in the form of a video and noisy gaze estimates, and outputs visual saliency, a refined gaze estimate, and an estimate of the person's attended awareness. In order to test our model, we capture a new dataset with a high-precision gaze tracker including 24.5 hours of gaze sequences from 23 subjects attending to videos of driving scenes. The dataset also contains third-party annotations of the subjects' attended awareness based on observations of their scan path. Our results show that our model is able to reasonably estimate attended awareness in a controlled setting, and in the future could potentially be extended to real egocentric driving data to help enable more effective ahead-of-time warnings in safety systems and thereby augment driver performance. We also demonstrate our model's effectiveness on the tasks of saliency, gaze calibration, and denoising, using both our dataset and an existing saliency dataset. We make our model and dataset available at https://github.com/ToyotaResearchInstitute/att-aware/.

Related papers

Training a Distributed Acoustic Sensing Traffic Monitoring Network With Video Inputs [0.0]
We present a novel concept that integrates DAS data with co-located visual information.<n>Our model achieves a performance exceeding 94% for detection and classification, and about 1.2% false alarm rate.
arXiv Detail & Related papers (2024-12-17T10:06:42Z)
Self-Updating Vehicle Monitoring Framework Employing Distributed Acoustic Sensing towards Real-World Settings [5.306938463648908]
We introduce a real-time semi-supervised vehicle monitoring framework tailored to urban settings. It requires only a small fraction of manual labels for initial training and exploits unlabeled data for model improvement. We propose a novel prior loss that incorporates the shapes of vehicular traces to track a single vehicle with varying speeds.
arXiv Detail & Related papers (2024-09-16T13:10:58Z)
VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions [10.748597086208145]
In this work, we propose a novel method that also incorporates visual input from surround-view cameras. Our method achieves a latency of 53 ms, making it feasible for real-time processing. Our experiments show that both the visual inputs and the textual descriptions contribute to improvements in trajectory prediction performance.
arXiv Detail & Related papers (2024-07-17T06:39:52Z)
Exploring the Evolution of Hidden Activations with Live-Update Visualization [12.377279207342735]
We introduce SentryCam, an automated, real-time visualization tool that reveals the progression of hidden representations during training. Our results show that this visualization offers a more comprehensive view of the learning dynamics compared to basic metrics. SentryCam could facilitate detailed analysis such as task transfer and catastrophic forgetting to a continual learning setting.
arXiv Detail & Related papers (2024-05-24T01:23:20Z)
Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past. We leverage the large-scale pretraining of image diffusion models which can handle multi-modality. We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z)
Data Limitations for Modeling Top-Down Effects on Drivers' Attention [12.246649738388388]
Driving is a visuomotor task, i.e., there is a connection between what drivers see and what they do. Some models of drivers' gaze account for top-down effects of drivers' actions. The majority learn only bottom-up correlations between human gaze and driving footage.
arXiv Detail & Related papers (2024-04-12T18:23:00Z)
EgoNav: Egocentric Scene-aware Human Trajectory Prediction [15.346096596482857]
Wearable collaborative robots stand to assist human wearers who need fall prevention assistance or wear exoskeletons. Such a robot needs to be able to constantly adapt to the surrounding scene based on egocentric vision, and predict the ego motion of the wearer. In this work, we leveraged body-mounted cameras and sensors to anticipate the trajectory of human wearers through complex surroundings.
arXiv Detail & Related papers (2024-03-27T21:43:12Z)
A Control-Centric Benchmark for Video Prediction [69.22614362800692]
We propose a benchmark for action-conditioned video prediction in the form of a control benchmark. Our benchmark includes simulated environments with 11 task categories and 310 task instance definitions. We then leverage our benchmark to study the effects of scaling model size, quantity of training data, and model ensembling.
arXiv Detail & Related papers (2023-04-26T17:59:45Z)
Reinforcement Learning from Passive Data via Latent Intentions [86.4969514480008]
We show that passive data can still be used to learn features that accelerate downstream RL. Our approach learns from passive data by modeling intentions. Our experiments demonstrate the ability to learn from many forms of passive data, including cross-embodiment video data and YouTube videos.
arXiv Detail & Related papers (2023-04-10T17:59:05Z)
Policy Pre-training for End-to-end Autonomous Driving via Self-supervised Geometric Modeling [96.31941517446859]
We propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only.
arXiv Detail & Related papers (2023-01-03T08:52:49Z)
TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks. To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame. Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z)
Visual Distant Supervision for Scene Graph Generation [66.10579690929623]
Scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation. We propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data. Comprehensive experimental results show that our distantly supervised model outperforms strong weakly supervised and semi-supervised baselines.
arXiv Detail & Related papers (2021-03-29T06:35:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.