Related papers: A POV-based Highway Vehicle Trajectory Dataset and Prediction Architecture

A POV-based Highway Vehicle Trajectory Dataset and Prediction Architecture

URL: http://arxiv.org/abs/2303.06202v1
Date: Fri, 10 Mar 2023 20:38:40 GMT
Title: A POV-based Highway Vehicle Trajectory Dataset and Prediction Architecture
Authors: Vinit Katariya, Ghazal Alinezhad Noghre, Armin Danesh Pazho, Hamed Tabkhi
Abstract summary: We introduce the emphCarolinas Highway dataset (CHDfootnoteemphCHD), a vehicle trajectory, detection, and tracking dataset. We also present emphPishguVe, a novel vehicle trajectory prediction architecture that uses attention-based graph isomorphism and convolutional neural networks. The results demonstrate that emphPishguVe outperforms existing algorithms to become the new state-of-the-art (SotA) in bird's-eye, eye-level, and high-angle POV
Score: 2.924868086534434
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vehicle Trajectory datasets that provide multiple point-of-views (POVs) can be valuable for various traffic safety and management applications. Despite the abundance of trajectory datasets, few offer a comprehensive and diverse range of driving scenes, capturing multiple viewpoints of various highway layouts, merging lanes, and configurations. This limits their ability to capture the nuanced interactions between drivers, vehicles, and the roadway infrastructure. We introduce the \emph{Carolinas Highway Dataset (CHD\footnote{\emph{CHD} available at: \url{https://github.com/TeCSAR-UNCC/Carolinas\_Dataset}})}, a vehicle trajectory, detection, and tracking dataset. \emph{CHD} is a collection of 1.6 million frames captured in highway-based videos from eye-level and high-angle POVs at eight locations across Carolinas with 338,000 vehicle trajectories. The locations, timing of recordings, and camera angles were carefully selected to capture various road geometries, traffic patterns, lighting conditions, and driving behaviors. We also present \emph{PishguVe}\footnote{\emph{PishguVe} code available at: \url{https://github.com/TeCSAR-UNCC/PishguVe}}, a novel vehicle trajectory prediction architecture that uses attention-based graph isomorphism and convolutional neural networks. The results demonstrate that \emph{PishguVe} outperforms existing algorithms to become the new state-of-the-art (SotA) in bird's-eye, eye-level, and high-angle POV trajectory datasets. Specifically, it achieves a 12.50\% and 10.20\% improvement in ADE and FDE, respectively, over the current SotA on NGSIM dataset. Compared to best-performing models on CHD, \emph{PishguVe} achieves lower ADE and FDE on eye-level data by 14.58\% and 27.38\%, respectively, and improves ADE and FDE on high-angle data by 8.3\% and 6.9\%, respectively.

Related papers

Tracking Meets Large Multimodal Models for Driving Scenario Understanding [76.71815464110153]
Large Multimodal Models (LMMs) have recently gained prominence in autonomous driving research. We propose to integrate tracking information as an additional input to recover 3D spatial and temporal details. We introduce a novel approach for embedding this tracking information into LMMs to enhance their understanding of driving scenarios.
arXiv Detail & Related papers (2025-03-18T17:59:12Z)
ROAD-Waymo: Action Awareness at Scale for Autonomous Driving [17.531603453254434]
ROAD-Waymo is an extensive dataset for the development and benchmarking of techniques for agent, action, location and event detection in road scenes. Considerably larger and more challenging than any existing dataset (and encompassing multiple cities), it comes with 198k annotated video frames, 54k agent tubes, 3.9M bounding boxes and a total of 12.4M labels.
arXiv Detail & Related papers (2024-11-03T20:46:50Z)
LMT-Net: Lane Model Transformer Network for Automated HD Mapping from Sparse Vehicle Observations [11.395749549636868]
Lane Model Transformer Network (LMT-Net) is an encoder-decoder neural network architecture that performs polyline encoding and predicts lane pairs and their connectivity. We evaluate the performance of LMT-Net on an internal dataset that consists of multiple vehicle observations as well as human annotations as Ground Truth (GT)
arXiv Detail & Related papers (2024-09-19T02:14:35Z)
TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement [64.11385310305612]
We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations. The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS.
arXiv Detail & Related papers (2023-06-14T17:07:51Z)
Traffic Scene Parsing through the TSP6K Dataset [109.69836680564616]
We introduce a specialized traffic monitoring dataset, termed TSP6K, with high-quality pixel-level and instance-level annotations. The dataset captures more crowded traffic scenes with several times more traffic participants than the existing driving scenes. We propose a detail refining decoder for scene parsing, which recovers the details of different semantic regions in traffic scenes.
arXiv Detail & Related papers (2023-03-06T02:05:14Z)
Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting [64.7364925689825]
Argoverse 2 (AV2) is a collection of three datasets for perception and forecasting research in the self-driving domain. The Lidar dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. The Motion Forecasting dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene.
arXiv Detail & Related papers (2023-01-02T00:36:22Z)
IDD-3D: Indian Driving Dataset for 3D Unstructured Road Scenes [79.18349050238413]
Preparation and training of deploy-able deep learning architectures require the models to be suited to different traffic scenarios. An unstructured and complex driving layout found in several developing countries such as India poses a challenge to these models. We build a new dataset, IDD-3D, which consists of multi-modal data from multiple cameras and LiDAR sensors with 12k annotated driving LiDAR frames.
arXiv Detail & Related papers (2022-10-23T23:03:17Z)
Spatial-Temporal Deep Embedding for Vehicle Trajectory Reconstruction from High-Angle Video [1.8520147498637294]
We develop a model that imposes parity constraints at both pixel and instance levels to generate instance-aware embeddings for vehicle segmentation on STMap. The designed model is applied to process all public NGSIM US-101 videos to generate complete vehicle trajectories.
arXiv Detail & Related papers (2022-09-17T22:32:05Z)
Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions [0.0]
We present a new dataset to enable robust autonomous driving via a novel data collection process. The dataset includes images and point clouds from cameras and LiDAR sensors, along with high-precision GPS/INS. We demonstrate the uniqueness of this dataset by analyzing the performance of baselines in amodal segmentation of road and objects.
arXiv Detail & Related papers (2022-08-01T22:55:32Z)
Partitioned Graph Convolution Using Adversarial and Regression Networks for Road Travel Speed Prediction [0.0]
We propose a framework for predicting road segment travel speed histograms for dataless edges. The framework achieves an accuracy of 71.5% intersection and 78.5% correlation on predicting travel speed histograms. Experiments show that partitioning the dataset into clusters increases the performance of the framework.
arXiv Detail & Related papers (2021-02-26T22:16:48Z)
Detecting 32 Pedestrian Attributes for Autonomous Vehicles [103.87351701138554]
In this paper, we address the problem of jointly detecting pedestrians and recognizing 32 pedestrian attributes. We introduce a Multi-Task Learning (MTL) model relying on a composite field framework, which achieves both goals in an efficient way. We show competitive detection and attribute recognition results, as well as a more stable MTL training.
arXiv Detail & Related papers (2020-12-04T15:10:12Z)
VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation [74.56282712099274]
This paper introduces VectorNet, a hierarchical graph neural network that exploits the spatial locality of individual road components represented by vectors. By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps. We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset.
arXiv Detail & Related papers (2020-05-08T19:07:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.