Overcoming Imbalanced Safety Data Using Extended Accident Triangle
- URL: http://arxiv.org/abs/2408.07094v1
- Date: Mon, 12 Aug 2024 00:36:17 GMT
- Title: Overcoming Imbalanced Safety Data Using Extended Accident Triangle
- Authors: Kailai Sun, Tianxiang Lan, Yang Miang Goh, Yueng-Hsiang Huang,
- Abstract summary: Existing safety analytics studies have made remarkable progress, but suffer from imbalanced datasets.
We extend the theory of accident triangle to claim that the importance of data samples should be based on characteristics such as injury severity, accident frequency, and accident type.
We find robust improvements among different machine learning algorithms.
- Score: 1.1249583407496222
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is growing interest in using safety analytics and machine learning to support the prevention of workplace incidents, especially in high-risk industries like construction and trucking. Although existing safety analytics studies have made remarkable progress, they suffer from imbalanced datasets, a common problem in safety analytics, resulting in prediction inaccuracies. This can lead to management problems, e.g., incorrect resource allocation and improper interventions. To overcome the imbalanced data problem, we extend the theory of accident triangle to claim that the importance of data samples should be based on characteristics such as injury severity, accident frequency, and accident type. Thus, three oversampling methods are proposed based on assigning different weights to samples in the minority class. We find robust improvements among different machine learning algorithms. For the lack of open-source safety datasets, we are sharing three imbalanced datasets, e.g., a 9-year nationwide construction accident record dataset, and their corresponding codes.
Related papers
- Crash Severity Risk Modeling Strategies under Data Imbalance [7.9613232032536745]
This study investigates crash severity risk modeling strategies for work zones involving large vehicles when there are crash data imbalance between low-severity (LS) and high-severity (HS) crashes.
We utilized crash data, involving large vehicles in South Carolina work zones for the period between 2014 and 2018, which included 4 times more LS crashes compared to HS crashes.
The findings of this study highlight a disparity between LS and HS predictions, with less-accurate prediction of HS crashes compared to LS crashes due to class imbalance and feature overlaps between LS and HS crashes.
arXiv Detail & Related papers (2024-12-03T02:28:35Z) - Ultra-imbalanced classification guided by statistical information [24.969543903532664]
We take a population-level approach to imbalanced learning by proposing a new formulation called emphultra-imbalanced classification (UIC)
Under UIC, loss functions behave differently even if infinite amount of training samples are available.
A novel learning objective termed Tunable Boosting Loss is developed which is provably resistant against data imbalance under UIC.
arXiv Detail & Related papers (2024-09-06T08:07:09Z) - Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses [76.59021017301127]
We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports.
We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes.
Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
arXiv Detail & Related papers (2024-06-16T03:10:16Z) - A Generative Deep Learning Approach for Crash Severity Modeling with Imbalanced Data [6.169163527464771]
This study proposes a crash data generation method based on Conditional Tabular GAN.
A crash severity model is employed to estimate the performance of classification and interpretation.
The results indicate that using synthetic data generated by CTGAN-RU for crash severity modeling outperforms original data or synthetic data generated by other resampling methods.
arXiv Detail & Related papers (2024-04-02T16:07:27Z) - Analyzing Adversarial Inputs in Deep Reinforcement Learning [53.3760591018817]
We present a comprehensive analysis of the characterization of adversarial inputs, through the lens of formal verification.
We introduce a novel metric, the Adversarial Rate, to classify models based on their susceptibility to such perturbations.
Our analysis empirically demonstrates how adversarial inputs can affect the safety of a given DRL system with respect to such perturbations.
arXiv Detail & Related papers (2024-02-07T21:58:40Z) - Neural Collapse Terminus: A Unified Solution for Class Incremental
Learning and Its Variants [166.916517335816]
In this paper, we offer a unified solution to the misalignment dilemma in the three tasks.
We propose neural collapse terminus that is a fixed structure with the maximal equiangular inter-class separation for the whole label space.
Our method holds the neural collapse optimality in an incremental fashion regardless of data imbalance or data scarcity.
arXiv Detail & Related papers (2023-08-03T13:09:59Z) - Holistic Robust Data-Driven Decisions [0.0]
Practical overfitting can typically not be attributed to a single cause but is caused by several factors simultaneously.
We consider here three overfitting sources: (i) statistical error as a result of working with finite sample data, (ii) data noise, which occurs when the data points are measured only with finite precision, and finally, (iii) data misspecification in which a small fraction of all data may be wholly corrupted.
We design a novel data-driven formulation that guarantees such holistic protection and is computationally viable.
arXiv Detail & Related papers (2022-07-19T21:28:51Z) - Predicting Seriousness of Injury in a Traffic Accident: A New Imbalanced
Dataset and Benchmark [62.997667081978825]
The paper introduces a new dataset to assess the performance of machine learning algorithms in the prediction of the seriousness of injury in a traffic accident.
The dataset is created by aggregating publicly available datasets from the UK Department for Transport.
arXiv Detail & Related papers (2022-05-20T21:15:26Z) - Prediction of motor insurance claims occurrence as an imbalanced machine
learning problem [0.0]
Insurance industry, with its large datasets, is a natural place to use big data solutions.
The main goal of this work is to present and apply various methods of dealing with an imbalanced dataset.
In addition, the above techniques are used to compare the results of machine learning algorithms in the context of claim occurrence prediction in car insurance.
arXiv Detail & Related papers (2022-04-12T22:41:47Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.