Related papers: IDE-Net: Interactive Driving Event and Pattern Extraction from Human Data

IDE-Net: Interactive Driving Event and Pattern Extraction from Human Data

URL: http://arxiv.org/abs/2011.02403v1
Date: Wed, 4 Nov 2020 16:56:12 GMT
Title: IDE-Net: Interactive Driving Event and Pattern Extraction from Human Data
Authors: Xiaosong Jia, Liting Sun, Masayoshi Tomizuka, Wei Zhan
Abstract summary: We propose the Interactive Driving event and pattern Extraction Network (IDE-Net) to automatically extract interaction events and patterns. IDE-Net is a deep learning framework to automatically extract events and patterns directly from vehicle trajectories. We find three interpretable patterns of interactions, bringing insights for driver behavior representation, modeling and comprehension.
Score: 35.473428772961235
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autonomous vehicles (AVs) need to share the road with multiple, heterogeneous road users in a variety of driving scenarios. It is overwhelming and unnecessary to carefully interact with all observed agents, and AVs need to determine whether and when to interact with each surrounding agent. In order to facilitate the design and testing of prediction and planning modules of AVs, in-depth understanding of interactive behavior is expected with proper representation, and events in behavior data need to be extracted and categorized automatically. Answers to what are the essential patterns of interactions are also crucial for these motivations in addition to answering whether and when. Thus, learning to extract interactive driving events and patterns from human data for tackling the whether-when-what tasks is of critical importance for AVs. There is, however, no clear definition and taxonomy of interactive behavior, and most of the existing works are based on either manual labelling or hand-crafted rules and features. In this paper, we propose the Interactive Driving event and pattern Extraction Network (IDE-Net), which is a deep learning framework to automatically extract interaction events and patterns directly from vehicle trajectories. In IDE-Net, we leverage the power of multi-task learning and proposed three auxiliary tasks to assist the pattern extraction in an unsupervised fashion. We also design a unique spatial-temporal block to encode the trajectory data. Experimental results on the INTERACTION dataset verified the effectiveness of such designs in terms of better generalizability and effective pattern extraction. We find three interpretable patterns of interactions, bringing insights for driver behavior representation, modeling and comprehension. Both objective and subjective evaluation metrics are adopted in our analysis of the learned patterns.

Related papers

An interactive enhanced driving dataset for autonomous driving [17.420156557113465]
This paper proposes the Interactive Enhanced Driving dataset (IEDD)<n>We develop a scalable pipeline to mine million-level interactive segments from naturalistic driving data.<n>The IEDD-VQA dataset is constructed by generating synthetic Bird's Eye View (BEV) videos.
arXiv Detail & Related papers (2026-02-24T05:57:18Z)
Pedestrian Crossing Intention Prediction Using Multimodal Fusion Network [3.878105750489656]
Pedestrian crossing intention prediction is essential for the deployment of autonomous vehicles (AVs) in urban environments.<n>This paper proposes a multimodal fusion network that leverages seven modality features from both visual and motion branches.<n>Experiments on the JAAD dataset validate the effectiveness of the proposed network, achieving superior performance compared to the baseline methods.
arXiv Detail & Related papers (2025-11-25T07:18:12Z)
Large Language Models for Pedestrian Safety: An Application to Predicting Driver Yielding Behavior at Unsignalized Intersections [5.913801021011149]
Large language models (LLMs) are suited for extracting patterns from heterogeneous traffic data, enabling accurate modeling of driver-pedestrian interactions.<n>This paper benchmarks state-of-the-art LLMs against traditional classifiers, finding that GPT-4o consistently achieves the highest accuracy and recall, while Deepseek-V3 excels in precision.
arXiv Detail & Related papers (2025-09-24T00:25:19Z)
ILNet: Trajectory Prediction with Inverse Learning Attention for Enhancing Intention Capture [4.190790144182306]
It is acknowledged that human drivers dynamically adjust initial driving decisions based on assumptions about the intentions surrounding vehicles.<n>Motivated by human driving behaviors, this paper proposes ILNet, a multi-agent trajectory prediction method with Inverse Learning (IL) attention and Dynamic Anchor SelectionDAS (DAS) module.<n> Experimental results show that the ILNet achieves state-of-the-art performance on the INTERACTION and Argoverse motion forecasting datasets.
arXiv Detail & Related papers (2025-07-09T04:18:01Z)
BIDA: A Bi-level Interaction Decision-making Algorithm for Autonomous Vehicles in Dynamic Traffic Scenarios [5.193590097161461]
We design a bi-level interaction decision-making algorithm (BIDA) that integrates interactive Monte Carlo tree search (MCTS) with deep reinforcement learning (DRL)<n>Specifically, we adopt three types of DRL algorithms to construct a reliable value network and policy network, which guide the online deduction process of interactive MCTS.<n> Experimental evaluations demonstrate that our BIDA not only enhances interactive deduction and reduces computational costs, but also outperforms other latest benchmarks.
arXiv Detail & Related papers (2025-06-19T19:03:40Z)
Interaction Dataset of Autonomous Vehicles with Traffic Lights and Signs [11.127555705122283]
This paper presents the development of a comprehensive dataset capturing interactions between Autonomous Vehicles (AVs) and traffic control devices, specifically traffic lights and stop signs. Our work addresses a critical gap in the existing literature by providing real-world trajectory data on how AVs navigate these traffic control devices. We propose a methodology for identifying and extracting relevant interaction trajectory data from the Motion dataset, incorporating over 37,000 instances with traffic lights and 44,000 with stop signs.
arXiv Detail & Related papers (2025-01-21T22:59:50Z)
DeepInteraction++: Multi-Modality Interaction for Autonomous Driving [80.8837864849534]
We introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout. DeepInteraction++ is a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder. Experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.
arXiv Detail & Related papers (2024-08-09T14:04:21Z)
GraphAD: Interaction Scene Graph for End-to-end Autonomous Driving [16.245949174447574]
We propose the Interaction Scene Graph (ISG) as a unified method to model the interactions among the ego-vehicle, road agents, and map elements. We evaluate the proposed method for end-to-end autonomous driving on the nuScenes dataset.
arXiv Detail & Related papers (2024-03-28T02:22:28Z)
Interactive Autonomous Navigation with Internal State Inference and Interactivity Estimation [58.21683603243387]
We propose three auxiliary tasks with relational-temporal reasoning and integrate them into the standard Deep Learning framework. These auxiliary tasks provide additional supervision signals to infer the behavior patterns other interactive agents. Our approach achieves robust and state-of-the-art performance in terms of standard evaluation metrics.
arXiv Detail & Related papers (2023-11-27T18:57:42Z)
Pixel State Value Network for Combined Prediction and Planning in Interactive Environments [9.117828575880303]
This work proposes a deep learning methodology to combine prediction and planning. A conditional GAN with the U-Net architecture is trained to predict two high-resolution image sequences. Results demonstrate intuitive behavior in complex situations, such as lane changes amidst conflicting objectives.
arXiv Detail & Related papers (2023-10-11T17:57:13Z)
RSG-Net: Towards Rich Sematic Relationship Prediction for Intelligent Vehicle in Complex Environments [72.04891523115535]
We propose RSG-Net (Road Scene Graph Net): a graph convolutional network designed to predict potential semantic relationships from object proposals. The experimental results indicate that this network, trained on Road Scene Graph dataset, could efficiently predict potential semantic relationships among objects around the ego-vehicle.
arXiv Detail & Related papers (2022-07-16T12:40:17Z)
TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks. To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame. Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z)
SCOUT: Socially-COnsistent and UndersTandable Graph Attention Network for Trajectory Prediction of Vehicles and VRUs [0.0]
SCOUT is a novel Attention-based Graph Neural Network that uses a flexible and generic representation of the scene as a graph. We explore three different attention mechanisms and test our scheme with both bird-eye-view and on-vehicle urban data. We evaluate our model's flexibility and transferability by testing it under completely new scenarios on RounD dataset.
arXiv Detail & Related papers (2021-02-12T06:29:28Z)
Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images. Our approach is fully automatic without any human interaction. We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z)
Pedestrian Behavior Prediction via Multitask Learning and Categorical Interaction Modeling [13.936894582450734]
We propose a multitask learning framework that simultaneously predicts trajectories and actions of pedestrians by relying on multimodal data. We show that our model achieves state-of-the-art performance and improves trajectory and action prediction by up to 22% and 6% respectively.
arXiv Detail & Related papers (2020-12-06T15:57:11Z)
Interaction-Based Trajectory Prediction Over a Hybrid Traffic Graph [4.574413934477815]
We propose to use a hybrid graph whose nodes represent both the traffic actors as well as the static and dynamic traffic elements present in the scene. The different modes of temporal interaction (e.g., stopping and going) among actors and traffic elements are explicitly modeled by graph edges. We show that our proposed model, TrafficGraphNet, achieves state-of-the-art trajectory prediction accuracy while maintaining a high level of interpretability.
arXiv Detail & Related papers (2020-09-27T18:20:03Z)
Studying Person-Specific Pointing and Gaze Behavior for Multimodal Referencing of Outside Objects from a Moving Vehicle [58.720142291102135]
Hand pointing and eye gaze have been extensively investigated in automotive applications for object selection and referencing. Existing outside-the-vehicle referencing methods focus on a static situation, whereas the situation in a moving vehicle is highly dynamic and subject to safety-critical constraints. We investigate the specific characteristics of each modality and the interaction between them when used in the task of referencing outside objects.
arXiv Detail & Related papers (2020-09-23T14:56:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.