Related papers: Prediction of motor insurance claims occurrence as an imbalanced machine learning problem

Prediction of motor insurance claims occurrence as an imbalanced machine learning problem

URL: http://arxiv.org/abs/2204.06109v1
Date: Tue, 12 Apr 2022 22:41:47 GMT
Title: Prediction of motor insurance claims occurrence as an imbalanced machine learning problem
Authors: Sebastian Baran, Przemys{\l}aw Rola
Abstract summary: Insurance industry, with its large datasets, is a natural place to use big data solutions. The main goal of this work is to present and apply various methods of dealing with an imbalanced dataset. In addition, the above techniques are used to compare the results of machine learning algorithms in the context of claim occurrence prediction in car insurance.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The insurance industry, with its large datasets, is a natural place to use big data solutions. However it must be stressed, that significant number of applications for machine learning in insurance industry, like fraud detection or claim prediction, deals with the problem of machine learning on an imbalanced data set. This is due to the fact that frauds or claims are rare events when compared with the entire population of drivers. The problem of imbalanced learning is often hard to overcome. Therefore, the main goal of this work is to present and apply various methods of dealing with an imbalanced dataset in the context of claim occurrence prediction in car insurance. In addition, the above techniques are used to compare the results of machine learning algorithms in the context of claim occurrence prediction in car insurance. Our study covers the following techniques: logistic-regression, decision tree, random forest, xgBoost, feed-forward network. The problem is the classification one.

Related papers

Overtake Detection in Trucks Using CAN Bus Signals: A Comparative Study of Machine Learning Methods [51.28632782308621]
We focus on overtake detection using Controller Area Network (CAN) bus data collected from five in-service trucks provided by the Volvo Group.<n>We evaluate three common classifiers for vehicle manoeuvre detection, Artificial Neural Networks (ANN), Random Forest (RF), and Support Vector Machines (SVM)<n>Our pertruck analysis also reveals that classification accuracy, especially for overtakes, depends on the amount of training data per vehicle.
arXiv Detail & Related papers (2025-07-01T09:20:41Z)
Overcoming Imbalanced Safety Data Using Extended Accident Triangle [1.1249583407496222]
Existing safety analytics studies have made remarkable progress, but suffer from imbalanced datasets. We extend the theory of accident triangle to claim that the importance of data samples should be based on characteristics such as injury severity, accident frequency, and accident type. We find robust improvements among different machine learning algorithms.
arXiv Detail & Related papers (2024-08-12T00:36:17Z)
Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses [76.59021017301127]
We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports. We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes. Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
arXiv Detail & Related papers (2024-06-16T03:10:16Z)
An engine to simulate insurance fraud network data [1.3812010983144802]
We develop a simulation machine that is engineered to create synthetic data with a network structure. We can specify the total number of policyholders and parties, the desired level of imbalance and the (effect size of the) features in the fraud generating model. The simulation engine enables researchers and practitioners to examine several methodological challenges as well as to test their (development strategy of) insurance fraud detection models.
arXiv Detail & Related papers (2023-08-21T13:14:00Z)
Prediction-Powered Inference [68.97619568620709]
Prediction-powered inference is a framework for performing valid statistical inference when an experimental dataset is supplemented with predictions from a machine-learning system. The framework yields simple algorithms for computing provably valid confidence intervals for quantities such as means, quantiles, and linear and logistic regression coefficients. Prediction-powered inference could enable researchers to draw valid and more data-efficient conclusions using machine learning.
arXiv Detail & Related papers (2023-01-23T18:59:28Z)
Applying Machine Learning to Life Insurance: some knowledge sharing to master it [0.0]
This paper reviews traditional actuarial methodologies for survival modeling and extends them with Machine Learning techniques. It points out differences with regular machine learning models and emphasizes importance of specific implementations to face censored data. Various open-source Machine Learning algorithms have been adjusted to adapt the specificities of life insurance data, namely censoring and truncation.
arXiv Detail & Related papers (2022-09-05T17:09:03Z)
Predicting Seriousness of Injury in a Traffic Accident: A New Imbalanced Dataset and Benchmark [62.997667081978825]
The paper introduces a new dataset to assess the performance of machine learning algorithms in the prediction of the seriousness of injury in a traffic accident. The dataset is created by aggregating publicly available datasets from the UK Department for Transport.
arXiv Detail & Related papers (2022-05-20T21:15:26Z)
Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data. We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class. For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z)
A model for traffic incident prediction using emergency braking data [77.34726150561087]
We address the fundamental problem of data scarcity in road traffic accident prediction by training our model on emergency braking events instead of accidents. We present a prototype implementing a traffic incident prediction model for Germany based on emergency braking data from Mercedes-Benz vehicles.
arXiv Detail & Related papers (2021-02-12T18:17:12Z)
PrognoseNet: A Generative Probabilistic Framework for Multimodal Position Prediction given Context Information [2.5302126831371226]
We propose an approach which reformulates the prediction problem as a classification task, allowing for powerful tools. A smart choice of the latent variable allows for the reformulation of the log-likelihood function as a combination of a classification problem and a much simplified regression problem. The proposed approach can easily incorporate context information and does not require any preprocessing of the data.
arXiv Detail & Related papers (2020-10-02T06:13:41Z)
Fairness Constraints in Semi-supervised Learning [56.48626493765908]
We develop a framework for fair semi-supervised learning, which is formulated as an optimization problem. We theoretically analyze the source of discrimination in semi-supervised learning via bias, variance and noise decomposition. Our method is able to achieve fair semi-supervised learning, and reach a better trade-off between accuracy and fairness than fair supervised learning.
arXiv Detail & Related papers (2020-09-14T04:25:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.