Prediction of motor insurance claims occurrence as an imbalanced machine
learning problem
- URL: http://arxiv.org/abs/2204.06109v1
- Date: Tue, 12 Apr 2022 22:41:47 GMT
- Title: Prediction of motor insurance claims occurrence as an imbalanced machine
learning problem
- Authors: Sebastian Baran, Przemys{\l}aw Rola
- Abstract summary: Insurance industry, with its large datasets, is a natural place to use big data solutions.
The main goal of this work is to present and apply various methods of dealing with an imbalanced dataset.
In addition, the above techniques are used to compare the results of machine learning algorithms in the context of claim occurrence prediction in car insurance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The insurance industry, with its large datasets, is a natural place to use
big data solutions. However it must be stressed, that significant number of
applications for machine learning in insurance industry, like fraud detection
or claim prediction, deals with the problem of machine learning on an
imbalanced data set. This is due to the fact that frauds or claims are rare
events when compared with the entire population of drivers. The problem of
imbalanced learning is often hard to overcome. Therefore, the main goal of this
work is to present and apply various methods of dealing with an imbalanced
dataset in the context of claim occurrence prediction in car insurance. In
addition, the above techniques are used to compare the results of machine
learning algorithms in the context of claim occurrence prediction in car
insurance. Our study covers the following techniques: logistic-regression,
decision tree, random forest, xgBoost, feed-forward network. The problem is the
classification one.
Related papers
- Overcoming Imbalanced Safety Data Using Extended Accident Triangle [1.1249583407496222]
Existing safety analytics studies have made remarkable progress, but suffer from imbalanced datasets.
We extend the theory of accident triangle to claim that the importance of data samples should be based on characteristics such as injury severity, accident frequency, and accident type.
We find robust improvements among different machine learning algorithms.
arXiv Detail & Related papers (2024-08-12T00:36:17Z) - Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses [76.59021017301127]
We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports.
We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes.
Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
arXiv Detail & Related papers (2024-06-16T03:10:16Z) - An engine to simulate insurance fraud network data [1.3812010983144802]
We develop a simulation machine that is engineered to create synthetic data with a network structure.
We can specify the total number of policyholders and parties, the desired level of imbalance and the (effect size of the) features in the fraud generating model.
The simulation engine enables researchers and practitioners to examine several methodological challenges as well as to test their (development strategy of) insurance fraud detection models.
arXiv Detail & Related papers (2023-08-21T13:14:00Z) - Prediction-Powered Inference [68.97619568620709]
Prediction-powered inference is a framework for performing valid statistical inference when an experimental dataset is supplemented with predictions from a machine-learning system.
The framework yields simple algorithms for computing provably valid confidence intervals for quantities such as means, quantiles, and linear and logistic regression coefficients.
Prediction-powered inference could enable researchers to draw valid and more data-efficient conclusions using machine learning.
arXiv Detail & Related papers (2023-01-23T18:59:28Z) - Applying Machine Learning to Life Insurance: some knowledge sharing to
master it [0.0]
This paper reviews traditional actuarial methodologies for survival modeling and extends them with Machine Learning techniques.
It points out differences with regular machine learning models and emphasizes importance of specific implementations to face censored data.
Various open-source Machine Learning algorithms have been adjusted to adapt the specificities of life insurance data, namely censoring and truncation.
arXiv Detail & Related papers (2022-09-05T17:09:03Z) - Predicting Seriousness of Injury in a Traffic Accident: A New Imbalanced
Dataset and Benchmark [62.997667081978825]
The paper introduces a new dataset to assess the performance of machine learning algorithms in the prediction of the seriousness of injury in a traffic accident.
The dataset is created by aggregating publicly available datasets from the UK Department for Transport.
arXiv Detail & Related papers (2022-05-20T21:15:26Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - A model for traffic incident prediction using emergency braking data [77.34726150561087]
We address the fundamental problem of data scarcity in road traffic accident prediction by training our model on emergency braking events instead of accidents.
We present a prototype implementing a traffic incident prediction model for Germany based on emergency braking data from Mercedes-Benz vehicles.
arXiv Detail & Related papers (2021-02-12T18:17:12Z) - PrognoseNet: A Generative Probabilistic Framework for Multimodal
Position Prediction given Context Information [2.5302126831371226]
We propose an approach which reformulates the prediction problem as a classification task, allowing for powerful tools.
A smart choice of the latent variable allows for the reformulation of the log-likelihood function as a combination of a classification problem and a much simplified regression problem.
The proposed approach can easily incorporate context information and does not require any preprocessing of the data.
arXiv Detail & Related papers (2020-10-02T06:13:41Z) - Fairness Constraints in Semi-supervised Learning [56.48626493765908]
We develop a framework for fair semi-supervised learning, which is formulated as an optimization problem.
We theoretically analyze the source of discrimination in semi-supervised learning via bias, variance and noise decomposition.
Our method is able to achieve fair semi-supervised learning, and reach a better trade-off between accuracy and fairness than fair supervised learning.
arXiv Detail & Related papers (2020-09-14T04:25:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.