Related papers: G-MEMP: Gaze-Enhanced Multimodal Ego-Motion Prediction in Driving

G-MEMP: Gaze-Enhanced Multimodal Ego-Motion Prediction in Driving

URL: http://arxiv.org/abs/2312.08558v1
Date: Wed, 13 Dec 2023 23:06:30 GMT
Title: G-MEMP: Gaze-Enhanced Multimodal Ego-Motion Prediction in Driving
Authors: M. Eren Akbiyik, Nedko Savov, Danda Pani Paudel, Nikola Popovic, Christian Vater, Otmar Hilliges, Luc Van Gool, Xi Wang
Abstract summary: We focus on inferring the ego trajectory of a driver's vehicle using their gaze data. Next, we develop G-MEMP, a novel multimodal ego-trajectory prediction network that combines GPS and video input with gaze data. The results show that G-MEMP significantly outperforms state-of-the-art methods in both benchmarks.
Score: 71.9040410238973
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Understanding the decision-making process of drivers is one of the keys to ensuring road safety. While the driver intent and the resulting ego-motion trajectory are valuable in developing driver-assistance systems, existing methods mostly focus on the motions of other vehicles. In contrast, we focus on inferring the ego trajectory of a driver's vehicle using their gaze data. For this purpose, we first collect a new dataset, GEM, which contains high-fidelity ego-motion videos paired with drivers' eye-tracking data and GPS coordinates. Next, we develop G-MEMP, a novel multimodal ego-trajectory prediction network that combines GPS and video input with gaze data. We also propose a new metric called Path Complexity Index (PCI) to measure the trajectory complexity. We perform extensive evaluations of the proposed method on both GEM and DR(eye)VE, an existing benchmark dataset. The results show that G-MEMP significantly outperforms state-of-the-art methods in both benchmarks. Furthermore, ablation studies demonstrate over 20% improvement in average displacement using gaze data, particularly in challenging driving scenarios with a high PCI. The data, code, and models can be found at https://eth-ait.github.io/g-memp/.

Related papers

Learning to Drive by Imitating Surrounding Vehicles [0.6612847014373572]
Imitation learning is a promising approach for training autonomous vehicles to navigate complex traffic environments. We propose a data augmentation strategy that enhances imitation learning by leveraging the observed trajectories of nearby vehicles. We evaluate our approach using the state-of-the-art learning-based planning method PLUTO on the nuPlan dataset and demonstrate that our augmentation method leads to improved performance in complex driving scenarios.
arXiv Detail & Related papers (2025-03-08T00:40:47Z)
Driver Assistance System Based on Multimodal Data Hazard Detection [0.0]
This paper proposes a multimodal driver assistance detection system. It integrates road condition video, driver facial video, and audio data to enhance incident recognition accuracy.
arXiv Detail & Related papers (2025-02-05T09:02:39Z)
LMT-Net: Lane Model Transformer Network for Automated HD Mapping from Sparse Vehicle Observations [11.395749549636868]
Lane Model Transformer Network (LMT-Net) is an encoder-decoder neural network architecture that performs polyline encoding and predicts lane pairs and their connectivity. We evaluate the performance of LMT-Net on an internal dataset that consists of multiple vehicle observations as well as human annotations as Ground Truth (GT)
arXiv Detail & Related papers (2024-09-19T02:14:35Z)
MetaFollower: Adaptable Personalized Autonomous Car Following [63.90050686330677]
We propose an adaptable personalized car-following framework - MetaFollower. We first utilize Model-Agnostic Meta-Learning (MAML) to extract common driving knowledge from various CF events. We additionally combine Long Short-Term Memory (LSTM) and Intelligent Driver Model (IDM) to reflect temporal heterogeneity with high interpretability.
arXiv Detail & Related papers (2024-06-23T15:30:40Z)
SCOUT+: Towards Practical Task-Driven Drivers' Gaze Prediction [12.246649738388388]
SCOUT+ is a task- and context-aware model for drivers' gaze prediction. We evaluate our model on two datasets, DR(eye)VE and BDD-A.
arXiv Detail & Related papers (2024-04-12T18:29:10Z)
More Than Routing: Joint GPS and Route Modeling for Refine Trajectory Representation Learning [26.630640299709114]
We propose Joint GPS and Route Modelling based on self-supervised technology, namely JGRM. We develop two encoders, each tailored to capture representations of route and GPS trajectories respectively. The representations from the two modalities are fed into a shared transformer for inter-modal information interaction.
arXiv Detail & Related papers (2024-02-25T18:27:25Z)
FENet: Focusing Enhanced Network for Lane Detection [0.0]
This research pioneers networks augmented with Focusing Sampling, Partial Field of View Evaluation, Enhanced FPN architecture and Directional IoU Loss. Experiments demonstrate our Focusing Sampling strategy, emphasizing vital distant details unlike uniform approaches. Future directions include collecting on-road data and integrating complementary dual frameworks to further breakthroughs guided by human perception principles.
arXiv Detail & Related papers (2023-12-28T17:52:09Z)
DriveLM: Driving with Graph Visual Question Answering [57.51930417790141]
We study how vision-language models (VLMs) trained on web-scale data can be integrated into end-to-end driving systems. We propose a VLM-based baseline approach (DriveLM-Agent) for jointly performing Graph VQA and end-to-end driving.
arXiv Detail & Related papers (2023-12-21T18:59:12Z)
Unsupervised Domain Adaptation for Self-Driving from Past Traversal Features [69.47588461101925]
We propose a method to adapt 3D object detectors to new driving environments. Our approach enhances LiDAR-based detection models using spatial quantized historical features. Experiments on real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2023-09-21T15:00:31Z)
AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception [26.84439405241999]
We present an AssIstive Driving pErception dataset (AIDE) that considers context information both inside and outside the vehicle. AIDE facilitates holistic driver monitoring through three distinctive characteristics. Two fusion strategies are introduced to give new insights into learning effective multi-stream/modal representations.
arXiv Detail & Related papers (2023-07-26T03:12:05Z)
FollowNet: A Comprehensive Benchmark for Car-Following Behavior Modeling [20.784555362703294]
We establish a public benchmark dataset for car-following behavior modeling. The benchmark consists of more than 80K car-following events extracted from five public driving datasets. Results show that the deep deterministic policy gradient (DDPG) based model performs competitively with a lower MSE for spacing.
arXiv Detail & Related papers (2023-05-25T08:59:26Z)
OpenDriver: An Open-Road Driver State Detection Dataset [13.756530418314227]
This paper develops a large-scale multimodal driving dataset, OpenDriver, for driver state detection. The OpenDriver encompasses a total of 3,278 driving trips, with a signal collection duration spanning approximately 4,600 hours.
arXiv Detail & Related papers (2023-04-09T10:08:38Z)
FBLNet: FeedBack Loop Network for Driver Attention Prediction [50.936478241688114]
Nonobjective driving experience is difficult to model, so a mechanism simulating the driver experience accumulation procedure is absent in existing methods. We propose a FeedBack Loop Network (FBLNet), which attempts to model the driving experience accumulation procedure. Our model exhibits a solid advantage over existing methods, achieving an outstanding performance improvement on two driver attention benchmark datasets.
arXiv Detail & Related papers (2022-12-05T08:25:09Z)
Federated Deep Learning Meets Autonomous Vehicle Perception: Design and Verification [168.67190934250868]
Federated learning empowered connected autonomous vehicle (FLCAV) has been proposed. FLCAV preserves privacy while reducing communication and annotation costs. It is challenging to determine the network resources and road sensor poses for multi-stage training.
arXiv Detail & Related papers (2022-06-03T23:55:45Z)
SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous Driving [94.11868795445798]
We release a Large-Scale Object Detection benchmark for Autonomous driving, named as SODA10M, containing 10 million unlabeled images and 20K images labeled with 6 representative object categories. To improve diversity, the images are collected every ten seconds per frame within 32 different cities under different weather conditions, periods and location scenes. We provide extensive experiments and deep analyses of existing supervised state-of-the-art detection models, popular self-supervised and semi-supervised approaches, and some insights about how to develop future models.
arXiv Detail & Related papers (2021-06-21T13:55:57Z)
Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention. Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z)
Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction [71.97877759413272]
Trajectory prediction is a safety-critical tool for autonomous vehicles to plan and execute actions. Recent methods have achieved strong performances using Multi-Choice Learning objectives like winner-takes-all (WTA) or best-of-many. Our work addresses two key challenges in trajectory prediction, learning outputs, and better predictions by imposing constraints using driving knowledge.
arXiv Detail & Related papers (2021-04-16T17:58:56Z)
Detecting 32 Pedestrian Attributes for Autonomous Vehicles [103.87351701138554]
In this paper, we address the problem of jointly detecting pedestrians and recognizing 32 pedestrian attributes. We introduce a Multi-Task Learning (MTL) model relying on a composite field framework, which achieves both goals in an efficient way. We show competitive detection and attribute recognition results, as well as a more stable MTL training.
arXiv Detail & Related papers (2020-12-04T15:10:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.