A Survey and Benchmarking of Spatial-Temporal Traffic Data Imputation Models
- URL: http://arxiv.org/abs/2412.04733v2
- Date: Fri, 17 Oct 2025 07:37:32 GMT
- Title: A Survey and Benchmarking of Spatial-Temporal Traffic Data Imputation Models
- Authors: Shengnan Guo, Tonglong Wei, Yiheng Huang, Yan Lin, Zekai Shen, Yujuan Dong, Junliang Lin, Youfang Lin, Huaiyu Wan,
- Abstract summary: Traffic data imputation is a critical preprocessing step in intelligent transportation systems.<n>Despite substantial progress in imputation models, model selection and development for practical applications remains challenging.<n>This paper proposes practice-oriented for traffic data missing patterns and imputation models, systematically cataloging real-world traffic data loss scenarios.
- Score: 26.91571883554539
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traffic data imputation is a critical preprocessing step in intelligent transportation systems, underpinning the reliability of downstream transportation services. Despite substantial progress in imputation models, model selection and development for practical applications remains challenging due to three key gaps: 1) the absence of a model taxonomy for traffic data imputation to trace the technological development and highlight their distinct features. 2) the lack of unified benchmarking pipeline for fair and reproducible model evaluation across standardized traffic datasets. 3) insufficient in-depth analysis that jointly compare models across multiple dimensions, including effectiveness, computational efficiency and robustness. To this end, this paper proposes practice-oriented taxonomies for traffic data missing patterns and imputation models, systematically cataloging real-world traffic data loss scenarios and analyzing the characteristics of existing models. We further introduce a unified benchmarking pipeline to comprehensively evaluate 11 representative models across various missing patterns and rates, assessing overall performance, performance under challenging scenarios, computational efficiency, and providing visualizations. This work aims to provide a holistic perspective on traffic data imputation and to serve as a practical guideline for model selection and application in intelligent transportation systems.
Related papers
- A Survey on Efficient Vision-Language-Action Models [153.11669266922993]
Vision-Language-Action models (VLAs) represent a significant frontier in embodied intelligence, aiming to bridge digital knowledge with physical-world interaction.<n>Motivated by the urgent need to address these challenges, this survey presents the first comprehensive review of Efficient Vision-Language-Action models.
arXiv Detail & Related papers (2025-10-27T17:57:33Z) - Evaluating Generative Vehicle Trajectory Models for Traffic Intersection Dynamics [8.484294935626224]
Deep Generative models of traffic dynamics at signalized intersections can help traffic authorities better understand the efficiency and safety aspects.<n>At present, models are evaluated on computational metrics that primarily look at trajectory reconstruction errors.<n>We provide a comprehensive analytics tool to train, run, and evaluate models with metrics that give better insights into model performance from a traffic engineering point of view.
arXiv Detail & Related papers (2025-06-10T16:36:42Z) - Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction [55.914891182214475]
We introduce neural network reprogrammability as a unifying framework for model adaptation.<n>We present a taxonomy that categorizes such information manipulation approaches across four key dimensions.<n>We also analyze remaining technical challenges and ethical considerations.
arXiv Detail & Related papers (2025-06-05T05:42:27Z) - A Large-scale Benchmark on Geological Fault Delineation Models: Domain Shift, Training Dynamics, Generalizability, Evaluation and Inferential Behavior [12.23379456993682]
We present the first large-scale benchmarking study designed to provide answers and guidelines for domain shift strategies in seismic interpretation.<n>Our benchmark encompasses over $200$ models trained and evaluated on three heterogeneous datasets.<n>Our analysis highlights the fragility of current fine-tuning practices, the emergence of catastrophic forgetting, and the challenges of interpreting performance in a systematic manner.
arXiv Detail & Related papers (2025-05-13T13:56:43Z) - Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models [79.2162092822111]
We systematically evaluate reinforcement learning (RL) and control-based methods on a suite of navigation tasks.<n>We employ a latent dynamics model using the Joint Embedding Predictive Architecture (JEPA) and employ it for planning.<n>Our results show that model-free RL benefits most from large amounts of high-quality data, whereas model-based planning generalizes better to unseen layouts.
arXiv Detail & Related papers (2025-02-20T18:39:41Z) - Modeling IoT Traffic Patterns: Insights from a Statistical Analysis of an MTC Dataset [1.2289361708127877]
Internet-of-Things (IoT) is rapidly expanding, connecting numerous devices and becoming integral to our daily lives.
Effective IoT traffic management requires modeling and predicting intrincate machine-type communication (MTC) dynamics.
We perform a comprehensive statistical analysis of the MTC traffic utilizing goodness-of-fit tests, including well-established tests such as Kolmogorov-Smirnov, Anderson-Darling, chi-squared, and root mean square error.
arXiv Detail & Related papers (2024-09-03T14:24:18Z) - Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular Data [9.57464542357693]
This paper demonstrates that model-centric evaluations are biased, as real-world modeling pipelines often require dataset-specific preprocessing and feature engineering.
We select 10 relevant datasets from Kaggle competitions and implement expert-level preprocessing pipelines for each dataset.
After dataset-specific feature engineering, model rankings change considerably, performance differences decrease, and the importance of model selection reduces.
arXiv Detail & Related papers (2024-07-02T09:54:39Z) - A Hybrid Model for Traffic Incident Detection based on Generative
Adversarial Networks and Transformer Model [0.0]
Traffic incident detection plays an indispensable role in intelligent transportation systems.
Previous research has identified that the effectiveness of detection is significantly influenced by challenges related to acquiring large datasets.
A hybrid model combining transformer and generative adversarial networks (GANs) is proposed to address these challenges.
arXiv Detail & Related papers (2024-03-02T09:28:04Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - JRDB-Traj: A Dataset and Benchmark for Trajectory Forecasting in Crowds [79.00975648564483]
Trajectory forecasting models, employed in fields such as robotics, autonomous vehicles, and navigation, face challenges in real-world scenarios.
This dataset provides comprehensive data, including the locations of all agents, scene images, and point clouds, all from the robot's perspective.
The objective is to predict the future positions of agents relative to the robot using raw sensory input data.
arXiv Detail & Related papers (2023-11-05T18:59:31Z) - Reinforcement Learning with Human Feedback for Realistic Traffic
Simulation [53.85002640149283]
Key element of effective simulation is the incorporation of realistic traffic models that align with human knowledge.
This study identifies two main challenges: capturing the nuances of human preferences on realism and the unification of diverse traffic simulation models.
arXiv Detail & Related papers (2023-09-01T19:29:53Z) - Discover, Explanation, Improvement: An Automatic Slice Detection
Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints.
This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks.
Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv Detail & Related papers (2022-11-08T19:00:00Z) - SimSCOOD: Systematic Analysis of Out-of-Distribution Generalization in
Fine-tuned Source Code Models [58.78043959556283]
We study the behaviors of models under different fine-tuning methodologies, including full fine-tuning and Low-Rank Adaptation (LoRA) fine-tuning methods.
Our analysis uncovers that LoRA fine-tuning consistently exhibits significantly better OOD generalization performance than full fine-tuning across various scenarios.
arXiv Detail & Related papers (2022-10-10T16:07:24Z) - A Latent Feature Analysis-based Approach for Spatio-Temporal Traffic
Data Recovery [3.84562917529518]
Missing is an inevitable and common problem in data-driven intelligent (ITS)
This paper proposes an Aim-temporal traffic data completion method based on hidden feature analysis.
The results show that the model can accurately estimate the continuous missing data.
arXiv Detail & Related papers (2022-08-16T13:21:46Z) - Leveraging the structure of dynamical systems for data-driven modeling [111.45324708884813]
We consider the impact of the training set and its structure on the quality of the long-term prediction.
We show how an informed design of the training set, based on invariants of the system and the structure of the underlying attractor, significantly improves the resulting models.
arXiv Detail & Related papers (2021-12-15T20:09:20Z) - Dynamic Spatiotemporal Graph Convolutional Neural Networks for Traffic
Data Imputation with Complex Missing Patterns [3.9318191265352196]
We propose a novel deep learning framework called Dynamic Spatio Graph Contemporal Networks (DSTG) to impute missing traffic data.
We introduce a graph structure estimation technique to model the dynamic spatial dependencies real-time traffic information and road network structure.
Our proposed model outperforms existing deep learning models in all kinds of missing scenarios and the graph structure estimation technique contributes to the model performance.
arXiv Detail & Related papers (2021-09-17T05:47:17Z) - Training Deep Normalizing Flow Models in Highly Incomplete Data
Scenarios with Prior Regularization [13.985534521589257]
We propose a novel framework to facilitate the learning of data distributions in high paucity scenarios.
The proposed framework naturally stems from posing the process of learning from incomplete data as a joint optimization task.
arXiv Detail & Related papers (2021-04-03T20:57:57Z) - How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating
and Auditing Generative Models [95.8037674226622]
We introduce a 3-dimensional evaluation metric that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion.
Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity.
arXiv Detail & Related papers (2021-02-17T18:25:30Z) - Multi-intersection Traffic Optimisation: A Benchmark Dataset and a
Strong Baseline [85.9210953301628]
Control of traffic signals is fundamental and critical to alleviate traffic congestion in urban areas.
Because of the high complexity of modelling the problem, experimental settings of current works are often inconsistent.
We propose a novel and strong baseline model based on deep reinforcement learning with the encoder-decoder structure.
arXiv Detail & Related papers (2021-01-24T03:55:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.