Use ADAS Data to Predict Near-Miss Events: A Group-Based Zero-Inflated Poisson Approach
- URL: http://arxiv.org/abs/2509.02614v1
- Date: Sun, 31 Aug 2025 04:13:32 GMT
- Title: Use ADAS Data to Predict Near-Miss Events: A Group-Based Zero-Inflated Poisson Approach
- Authors: Xinbo Zhang, Montserrat Guillen, Lishuai Li, Xin Li, Youhua Frank Chen,
- Abstract summary: We analyze driving behavior big data to understand how people drive and powers applications such as risk evaluation, insurance pricing, and targeted intervention.<n>We show that the traditional statistical models underfit the dataset.<n>We propose a set of zero-inflated Poisson frameworks that learn latent behavior groups and fit offset-based count models via EM to yield calibrated, interpretable weekly risk predictions.
- Score: 7.242131400289439
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Driving behavior big data leverages multi-sensor telematics to understand how people drive and powers applications such as risk evaluation, insurance pricing, and targeted intervention. Usage-based insurance (UBI) built on these data has become mainstream. Telematics-captured near-miss events (NMEs) provide a timely alternative to claim-based risk, but weekly NMEs are sparse, highly zero-inflated, and behaviorally heterogeneous even after exposure normalization. Analyzing multi-sensor telematics and ADAS warnings, we show that the traditional statistical models underfit the dataset. We address these challenges by proposing a set of zero-inflated Poisson (ZIP) frameworks that learn latent behavior groups and fit offset-based count models via EM to yield calibrated, interpretable weekly risk predictions. Using a naturalistic dataset from a fleet of 354 commercial drivers over a year, during which the drivers completed 287,511 trips and logged 8,142,896 km in total, our results show consistent improvements over baselines and prior telematics models, with lower AIC/BIC values in-sample and better calibration out-of-sample. We also conducted sensitivity analyses on the EM-based grouping for the number of clusters, finding that the gains were robust and interpretable. Practically, this supports context-aware ratemaking on a weekly basis and fairer premiums by recognizing heterogeneous driving styles.
Related papers
- STAR : Bridging Statistical and Agentic Reasoning for Large Model Performance Prediction [78.0692157478247]
We propose STAR, a framework that bridges data-driven STatistical expectations with knowledge-driven Agentic Reasoning.<n>We show that STAR consistently outperforms all baselines on both score-based and rank-based metrics.
arXiv Detail & Related papers (2026-02-12T16:30:07Z) - EVEREST: An Evidential, Tail-Aware Transformer for Rare-Event Time-Series Forecasting [4.551615447454767]
EVEREST is a transformer-based architecture for probabilistic rare-event forecasting.<n>It delivers calibrated predictions and tail-aware risk estimation.<n>It is applicable to high-stakes domains such as industrial monitoring, weather, and satellite diagnostics.
arXiv Detail & Related papers (2026-01-26T23:15:20Z) - Model-Based Policy Adaptation for Closed-Loop End-to-End Autonomous Driving [54.46325690390831]
We propose Model-based Policy Adaptation (MPA), a general framework that enhances the robustness and safety of pretrained E2E driving agents during deployment.<n>MPA first generates diverse counterfactual trajectories using a geometry-consistent simulation engine.<n>MPA trains a diffusion-based policy adapter to refine the base policy's predictions and a multi-step Q value model to evaluate long-term outcomes.
arXiv Detail & Related papers (2025-11-26T17:01:41Z) - A Realistic Evaluation of Cross-Frequency Transfer Learning and Foundation Forecasting Models [32.56983347493999]
Cross-frequency transfer learning (CFTL) has emerged as a popular framework for curating large-scale time series datasets to pre-train foundation forecasting models (FFMs)<n>Although CFTL has shown promise, current benchmarking practices fall short of accurately assessing its performance.<n>This shortcoming stems from many factors: an over-reliance on small-scale evaluation datasets; inadequate treatment of sample size when computing summary statistics; reporting of suboptimal statistical models; and failing to account for non-negligible risks of overlap between pre-training and test datasets.
arXiv Detail & Related papers (2025-09-23T18:19:50Z) - Enhancing Crash Frequency Modeling Based on Augmented Multi-Type Data by Hybrid VAE-Diffusion-Based Generative Neural Networks [13.402051372401822]
A key challenge in crash frequency modelling is the prevalence of excessive zero observations.<n>We propose a hybrid VAE-Diffusion neural network, designed to reduce zero observations.<n>We assess the synthetic data quality generated by this model through metrics like similarity, accuracy, diversity, and structural consistency.
arXiv Detail & Related papers (2025-01-17T07:53:27Z) - Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.<n>We demonstrate that in this setting, the generalized cross validation estimator (GCV) fails to correctly predict the out-of-sample risk.<n>We further extend our analysis to the case where the test point has nontrivial correlations with the training set, a setting often encountered in time series forecasting.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking [65.24988062003096]
We present NAVSIM, a framework for benchmarking vision-based driving policies.
Our simulation is non-reactive, i.e., the evaluated policy and environment do not influence each other.
NAVSIM enabled a new competition held at CVPR 2024, where 143 teams submitted 463 entries, resulting in several new insights.
arXiv Detail & Related papers (2024-06-21T17:59:02Z) - Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses [76.59021017301127]
We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports.
We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes.
Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
arXiv Detail & Related papers (2024-06-16T03:10:16Z) - A Bayesian Approach for Prioritising Driving Behaviour Investigations in Telematic Auto Insurance Policies [0.6249768559720121]
We make use of trip GPS and accelerometer data, augmented by geospatial information, to train an imperfect classifier for delivery driving on a per-trip basis.
A posterior probability is converted to a priority score, which was used to select the most valuable candidates for manual investigation.
The approach has achieved a significant improvement in efficiency of human resource allocation compared to manual searching.
arXiv Detail & Related papers (2024-04-22T15:26:24Z) - Robust Survival Analysis with Adversarial Regularization [6.001304967469112]
Survival Analysis (SA) models the time until an event occurs.
Recent work shows that Neural Networks (NNs) can capture complex relationships in SA.
We leverage NN verification advances to create algorithms for robust, fully-parametric survival models.
arXiv Detail & Related papers (2023-12-26T12:18:31Z) - Post-Contextual-Bandit Inference [57.88785630755165]
Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking.
They can both improve outcomes for study participants and increase the chance of identifying good or even best policies.
To support credible inference on novel interventions at the end of the study, we still want to construct valid confidence intervals on average treatment effects, subgroup effects, or value of new policies.
arXiv Detail & Related papers (2021-06-01T12:01:51Z) - Data Augmentation of IMU Signals and Evaluation via a Semi-Supervised
Classification of Driving Behavior [4.640835690336653]
We present a semi-supervised learning solution to classify portions of trips according to whether drivers are driving aggressively or normally.
Our results show that, by utilizing RCGAN-generated labeled data, the classification of the drivers is improved in 79% of the cases.
arXiv Detail & Related papers (2020-06-16T15:49:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.