Enhanced Gradient Boosting for Zero-Inflated Insurance Claims and Comparative Analysis of CatBoost, XGBoost, and LightGBM
- URL: http://arxiv.org/abs/2307.07771v3
- Date: Tue, 18 Jun 2024 12:09:18 GMT
- Title: Enhanced Gradient Boosting for Zero-Inflated Insurance Claims and Comparative Analysis of CatBoost, XGBoost, and LightGBM
- Authors: Banghee So,
- Abstract summary: CatBoost is the best library for developing auto claim frequency models based on predictive performance.
We propose a new zero-inflated Poisson boosted tree model, with variation in the assumption about the relationship between inflation probability $p$ and distribution mean $mu$.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The property and casualty (P&C) insurance industry faces challenges in developing claim predictive models due to the highly right-skewed distribution of positive claims with excess zeros. To address this, actuarial science researchers have employed "zero-inflated" models that combine a traditional count model and a binary model. This paper investigates the use of boosting algorithms to process insurance claim data, including zero-inflated telematics data, to construct claim frequency models. Three popular gradient boosting libraries - XGBoost, LightGBM, and CatBoost - are evaluated and compared to determine the most suitable library for training insurance claim data and fitting actuarial frequency models. Through a comprehensive analysis of two distinct datasets, it is determined that CatBoost is the best for developing auto claim frequency models based on predictive performance. Furthermore, we propose a new zero-inflated Poisson boosted tree model, with variation in the assumption about the relationship between inflation probability $p$ and distribution mean $\mu$, and find that it outperforms others depending on data characteristics. This model enables us to take advantage of particular CatBoost tools, which makes it easier and more convenient to investigate the effects and interactions of various risk features on the frequency model when using telematics data.
Related papers
- Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching.
We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy.
Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z) - Learning Augmentation Policies from A Model Zoo for Time Series Forecasting [58.66211334969299]
We introduce AutoTSAug, a learnable data augmentation method based on reinforcement learning.
By augmenting the marginal samples with a learnable policy, AutoTSAug substantially improves forecasting performance.
arXiv Detail & Related papers (2024-09-10T07:34:19Z) - Zero-Inflated Tweedie Boosted Trees with CatBoost for Insurance Loss Analytics [0.8287206589886881]
We modify the Tweedie regression model to address its limitations in modeling aggregate claims for various types of insurance.
Our recommended approach involves a refined modeling of the zero-claim process, together with the integration of boosting methods.
Our modeling results reveal a marked improvement in model performance, showcasing its potential to deliver more accurate predictions.
arXiv Detail & Related papers (2024-06-23T20:03:55Z) - Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals [91.59906995214209]
We propose a new evaluation method, Counterfactual Attentiveness Test (CAT)
CAT uses counterfactuals by replacing part of the input with its counterpart from a different example, expecting an attentive model to change its prediction.
We show that GPT3 becomes less attentive with an increased number of demonstrations, while its accuracy on the test data improves.
arXiv Detail & Related papers (2023-11-16T06:27:35Z) - Quantile Extreme Gradient Boosting for Uncertainty Quantification [1.7685947618629572]
Extreme Gradient Boosting (XGBoost) is one of the most popular machine learning (ML) methods.
We propose enhancements to XGBoost whereby a modified quantile regression is used as the objective function to estimate uncertainty (QXGBoost)
Our proposed method had comparable or better performance than the uncertainty estimates generated for regular and quantile light gradient boosting.
arXiv Detail & Related papers (2023-04-23T19:46:19Z) - Bayesian CART models for insurance claims frequency [0.0]
classification and regression trees (CARTs) and their ensembles have gained popularity in the actuarial literature.
We introduce Bayesian CART models for insurance pricing, with a particular focus on claims frequency modelling.
Some simulations and real insurance data will be discussed to illustrate the applicability of these models.
arXiv Detail & Related papers (2023-03-03T13:48:35Z) - Adaptive LASSO estimation for functional hidden dynamic geostatistical
model [69.10717733870575]
We propose a novel model selection algorithm based on a penalized maximum likelihood estimator (PMLE) for functional hiddenstatistical models (f-HD)
The algorithm is based on iterative optimisation and uses an adaptive least absolute shrinkage and selector operator (GMSOLAS) penalty function, wherein the weights are obtained by the unpenalised f-HD maximum-likelihood estimators.
arXiv Detail & Related papers (2022-08-10T19:17:45Z) - Learning Summary Statistics for Bayesian Inference with Autoencoders [58.720142291102135]
We use the inner dimension of deep neural network based Autoencoders as summary statistics.
To create an incentive for the encoder to encode all the parameter-related information but not the noise, we give the decoder access to explicit or implicit information that has been used to generate the training data.
arXiv Detail & Related papers (2022-01-28T12:00:31Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Synthetic Dataset Generation of Driver Telematics [0.0]
This article describes techniques employed in the production of a synthetic dataset of driver telematics emulated from a similar real insurance dataset.
It follows a three-stage process using machine learning algorithms.
The resulting dataset is evaluated by comparing the synthetic and real datasets when Poisson and gamma regression models are fitted to the respective data.
arXiv Detail & Related papers (2021-01-30T15:52:56Z) - When stakes are high: balancing accuracy and transparency with
Model-Agnostic Interpretable Data-driven suRRogates [0.0]
Highly regulated industries, like banking and insurance, ask for transparent decision-making algorithms.
We present a procedure to develop a Model-Agnostic Interpretable Data-driven suRRogate (maidrr)
Knowledge is extracted from a black box via partial dependence effects.
This results in a segmentation of the feature space with automatic variable selection.
arXiv Detail & Related papers (2020-07-14T08:10:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.