G-MEMP: Gaze-Enhanced Multimodal Ego-Motion Prediction in Driving
- URL: http://arxiv.org/abs/2312.08558v1
- Date: Wed, 13 Dec 2023 23:06:30 GMT
- Title: G-MEMP: Gaze-Enhanced Multimodal Ego-Motion Prediction in Driving
- Authors: M. Eren Akbiyik, Nedko Savov, Danda Pani Paudel, Nikola Popovic,
Christian Vater, Otmar Hilliges, Luc Van Gool, Xi Wang
- Abstract summary: We focus on inferring the ego trajectory of a driver's vehicle using their gaze data.
Next, we develop G-MEMP, a novel multimodal ego-trajectory prediction network that combines GPS and video input with gaze data.
The results show that G-MEMP significantly outperforms state-of-the-art methods in both benchmarks.
- Score: 71.9040410238973
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding the decision-making process of drivers is one of the keys to
ensuring road safety. While the driver intent and the resulting ego-motion
trajectory are valuable in developing driver-assistance systems, existing
methods mostly focus on the motions of other vehicles. In contrast, we focus on
inferring the ego trajectory of a driver's vehicle using their gaze data. For
this purpose, we first collect a new dataset, GEM, which contains high-fidelity
ego-motion videos paired with drivers' eye-tracking data and GPS coordinates.
Next, we develop G-MEMP, a novel multimodal ego-trajectory prediction network
that combines GPS and video input with gaze data. We also propose a new metric
called Path Complexity Index (PCI) to measure the trajectory complexity. We
perform extensive evaluations of the proposed method on both GEM and DR(eye)VE,
an existing benchmark dataset. The results show that G-MEMP significantly
outperforms state-of-the-art methods in both benchmarks. Furthermore, ablation
studies demonstrate over 20% improvement in average displacement using gaze
data, particularly in challenging driving scenarios with a high PCI. The data,
code, and models can be found at https://eth-ait.github.io/g-memp/.
Related papers
- LMT-Net: Lane Model Transformer Network for Automated HD Mapping from Sparse Vehicle Observations [11.395749549636868]
Lane Model Transformer Network (LMT-Net) is an encoder-decoder neural network architecture that performs polyline encoding and predicts lane pairs and their connectivity.
We evaluate the performance of LMT-Net on an internal dataset that consists of multiple vehicle observations as well as human annotations as Ground Truth (GT)
arXiv Detail & Related papers (2024-09-19T02:14:35Z) - MetaFollower: Adaptable Personalized Autonomous Car Following [63.90050686330677]
We propose an adaptable personalized car-following framework - MetaFollower.
We first utilize Model-Agnostic Meta-Learning (MAML) to extract common driving knowledge from various CF events.
We additionally combine Long Short-Term Memory (LSTM) and Intelligent Driver Model (IDM) to reflect temporal heterogeneity with high interpretability.
arXiv Detail & Related papers (2024-06-23T15:30:40Z) - SCOUT+: Towards Practical Task-Driven Drivers' Gaze Prediction [12.246649738388388]
SCOUT+ is a task- and context-aware model for drivers' gaze prediction.
We evaluate our model on two datasets, DR(eye)VE and BDD-A.
arXiv Detail & Related papers (2024-04-12T18:29:10Z) - More Than Routing: Joint GPS and Route Modeling for Refine Trajectory
Representation Learning [26.630640299709114]
We propose Joint GPS and Route Modelling based on self-supervised technology, namely JGRM.
We develop two encoders, each tailored to capture representations of route and GPS trajectories respectively.
The representations from the two modalities are fed into a shared transformer for inter-modal information interaction.
arXiv Detail & Related papers (2024-02-25T18:27:25Z) - DriveLM: Driving with Graph Visual Question Answering [57.51930417790141]
We study how vision-language models (VLMs) trained on web-scale data can be integrated into end-to-end driving systems.
We propose a VLM-based baseline approach (DriveLM-Agent) for jointly performing Graph VQA and end-to-end driving.
arXiv Detail & Related papers (2023-12-21T18:59:12Z) - Unsupervised Domain Adaptation for Self-Driving from Past Traversal
Features [69.47588461101925]
We propose a method to adapt 3D object detectors to new driving environments.
Our approach enhances LiDAR-based detection models using spatial quantized historical features.
Experiments on real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2023-09-21T15:00:31Z) - FollowNet: A Comprehensive Benchmark for Car-Following Behavior Modeling [20.784555362703294]
We establish a public benchmark dataset for car-following behavior modeling.
The benchmark consists of more than 80K car-following events extracted from five public driving datasets.
Results show that the deep deterministic policy gradient (DDPG) based model performs competitively with a lower MSE for spacing.
arXiv Detail & Related papers (2023-05-25T08:59:26Z) - OpenDriver: An Open-Road Driver State Detection Dataset [13.756530418314227]
This paper develops a large-scale multimodal driving dataset, OpenDriver, for driver state detection.
The OpenDriver encompasses a total of 3,278 driving trips, with a signal collection duration spanning approximately 4,600 hours.
arXiv Detail & Related papers (2023-04-09T10:08:38Z) - SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous
Driving [94.11868795445798]
We release a Large-Scale Object Detection benchmark for Autonomous driving, named as SODA10M, containing 10 million unlabeled images and 20K images labeled with 6 representative object categories.
To improve diversity, the images are collected every ten seconds per frame within 32 different cities under different weather conditions, periods and location scenes.
We provide extensive experiments and deep analyses of existing supervised state-of-the-art detection models, popular self-supervised and semi-supervised approaches, and some insights about how to develop future models.
arXiv Detail & Related papers (2021-06-21T13:55:57Z) - Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention.
Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.