Heterogeneous Ensemble Learning for Enhanced Crash Forecasts -- A
Frequentest and Machine Learning based Stacking Framework
- URL: http://arxiv.org/abs/2207.10721v1
- Date: Thu, 21 Jul 2022 19:15:53 GMT
- Title: Heterogeneous Ensemble Learning for Enhanced Crash Forecasts -- A
Frequentest and Machine Learning based Stacking Framework
- Authors: Numan Ahmad, Behram Wali, Asad J. Khattak
- Abstract summary: In this study, we apply one of the key HEM methods, Stacking, to model crash frequency on five lane undivided segments (5T) of urban and suburban arterials.
The prediction performance of Stacking is compared with parametric statistical models (Poisson and negative binomial) and three state of the art machine learning techniques (Decision tree, random forest, and gradient boosting)
- Score: 0.803552105641624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A variety of statistical and machine learning methods are used to model crash
frequency on specific roadways with machine learning methods generally having a
higher prediction accuracy. Recently, heterogeneous ensemble methods (HEM),
including stacking, have emerged as more accurate and robust intelligent
techniques and are often used to solve pattern recognition problems by
providing more reliable and accurate predictions. In this study, we apply one
of the key HEM methods, Stacking, to model crash frequency on five lane
undivided segments (5T) of urban and suburban arterials. The prediction
performance of Stacking is compared with parametric statistical models (Poisson
and negative binomial) and three state of the art machine learning techniques
(Decision tree, random forest, and gradient boosting), each of which is termed
as the base learner. By employing an optimal weight scheme to combine
individual base learners through stacking, the problem of biased predictions in
individual base-learners due to differences in specifications and prediction
accuracies is avoided. Data including crash, traffic, and roadway inventory
were collected and integrated from 2013 to 2017. The data are split into
training, validation, and testing datasets. Estimation results of statistical
models reveal that besides other factors, crashes increase with density (number
per mile) of different types of driveways. Comparison of out-of-sample
predictions of various models confirms the superiority of Stacking over the
alternative methods considered. From a practical standpoint, stacking can
enhance prediction accuracy (compared to using only one base learner with a
particular specification). When applied systemically, stacking can help
identify more appropriate countermeasures.
Related papers
- Awareness of uncertainty in classification using a multivariate model and multi-views [1.3048920509133808]
The proposed model regularizes uncertain predictions, and trains to calculate both the predictions and their uncertainty estimations.
Given the multi-view predictions together with their uncertainties and confidences, we proposed several methods to calculate final predictions.
The proposed methodology was tested using CIFAR-10 dataset with clean and noisy labels.
arXiv Detail & Related papers (2024-04-16T06:40:51Z) - Blending gradient boosted trees and neural networks for point and
probabilistic forecasting of hierarchical time series [0.0]
We describe a blending methodology of machine learning models that belong to gradient boosted trees and neural networks families.
These principles were successfully applied in the recent M5 Competition on both Accuracy and Uncertainty tracks.
arXiv Detail & Related papers (2023-10-19T09:42:02Z) - Context-Aware Ensemble Learning for Time Series [11.716677452529114]
We introduce a new approach using a meta learner that effectively combines the base model predictions via using a superset of the features that is the union of the base models' feature vectors instead of the predictions themselves.
Our model does not use the predictions of the base models as inputs to a machine learning algorithm, but choose the best possible combination at each time step based on the state of the problem.
arXiv Detail & Related papers (2022-11-30T10:36:13Z) - Uncertainty Estimation for Language Reward Models [5.33024001730262]
Language models can learn a range of capabilities from unsupervised training on text corpora.
It is often easier for humans to choose between options than to provide labeled data, and prior work has achieved state-of-the-art performance by training a reward model from such preference comparisons.
We seek to address these problems via uncertainty estimation, which can improve sample efficiency and robustness using active learning and risk-averse reinforcement learning.
arXiv Detail & Related papers (2022-03-14T20:13:21Z) - Engineering the Neural Automatic Passenger Counter [0.0]
We explore and exploit various aspects of machine learning to increase reliability, performance, and counting quality.
We show how aggregation techniques such as ensemble quantiles can reduce bias, and we give an idea of the overall spread of the results.
arXiv Detail & Related papers (2022-03-02T14:56:11Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Flexible Model Aggregation for Quantile Regression [92.63075261170302]
Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions.
We investigate methods for aggregating any number of conditional quantile models.
All of the models we consider in this paper can be fit using modern deep learning toolkits.
arXiv Detail & Related papers (2021-02-26T23:21:16Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.