Zero-Inflated Tweedie Boosted Trees with CatBoost for Insurance Loss Analytics
- URL: http://arxiv.org/abs/2406.16206v2
- Date: Mon, 28 Oct 2024 10:53:53 GMT
- Title: Zero-Inflated Tweedie Boosted Trees with CatBoost for Insurance Loss Analytics
- Authors: Banghee So, Emiliano A. Valdez,
- Abstract summary: We modify the Tweedie regression model to address its limitations in modeling aggregate claims for various types of insurance.
Our recommended approach involves a refined modeling of the zero-claim process, together with the integration of boosting methods.
Our modeling results reveal a marked improvement in model performance, showcasing its potential to deliver more accurate predictions.
- Score: 0.8287206589886881
- License:
- Abstract: In this paper, we explore advanced modifications to the Tweedie regression model in order to address its limitations in modeling aggregate claims for various types of insurance such as automobile, health, and liability. Traditional Tweedie models, while effective in capturing the probability and magnitude of claims, usually fall short in accurately representing the large incidence of zero claims. Our recommended approach involves a refined modeling of the zero-claim process, together with the integration of boosting methods in order to help leverage an iterative process to enhance predictive accuracy. Despite the inherent slowdown in learning algorithms due to this iteration, several efficient implementation techniques that also help precise tuning of parameters like XGBoost, LightGBM, and CatBoost have emerged. Nonetheless, we chose to utilize CatBoost, an efficient boosting approach that effectively handles categorical and other special types of data. The core contribution of our paper is the assembly of separate modeling for zero claims and the application of tree-based boosting ensemble methods within a CatBoost framework, assuming that the inflated probability of zero is a function of the mean parameter. The efficacy of our enhanced Tweedie model is demonstrated through the application of an insurance telematics dataset, which presents the additional complexity of compositional feature variables. Our modeling results reveal a marked improvement in model performance, showcasing its potential to deliver more accurate predictions suitable for insurance claim analytics.
Related papers
- FPBoost: Fully Parametric Gradient Boosting for Survival Analysis [4.09225917049674]
We propose a novel paradigm for survival model design based on the weighted sum of individual fully parametric hazard contributions.
The proposed model, which we call FPBoost, is the first algorithm to directly optimize the survival likelihood via gradient boosting.
arXiv Detail & Related papers (2024-09-20T09:57:17Z) - Learn from the Past: A Proxy Guided Adversarial Defense Framework with
Self Distillation Regularization [53.04697800214848]
Adversarial Training (AT) is pivotal in fortifying the robustness of deep learning models.
AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting.
We present a general proxy guided defense framework, LAST' (bf Learn from the Pbf ast)
arXiv Detail & Related papers (2023-10-19T13:13:41Z) - Enhanced Gradient Boosting for Zero-Inflated Insurance Claims and Comparative Analysis of CatBoost, XGBoost, and LightGBM [0.0]
CatBoost is the best library for developing auto claim frequency models based on predictive performance.
We propose a new zero-inflated Poisson boosted tree model, with variation in the assumption about the relationship between inflation probability $p$ and distribution mean $mu$.
arXiv Detail & Related papers (2023-07-15T10:54:46Z) - Precision-Recall Divergence Optimization for Generative Modeling with
GANs and Normalizing Flows [54.050498411883495]
We develop a novel training method for generative models, such as Generative Adversarial Networks and Normalizing Flows.
We show that achieving a specified precision-recall trade-off corresponds to minimizing a unique $f$-divergence from a family we call the textitPR-divergences.
Our approach improves the performance of existing state-of-the-art models like BigGAN in terms of either precision or recall when tested on datasets such as ImageNet.
arXiv Detail & Related papers (2023-05-30T10:07:17Z) - Adaptive LASSO estimation for functional hidden dynamic geostatistical
model [69.10717733870575]
We propose a novel model selection algorithm based on a penalized maximum likelihood estimator (PMLE) for functional hiddenstatistical models (f-HD)
The algorithm is based on iterative optimisation and uses an adaptive least absolute shrinkage and selector operator (GMSOLAS) penalty function, wherein the weights are obtained by the unpenalised f-HD maximum-likelihood estimators.
arXiv Detail & Related papers (2022-08-10T19:17:45Z) - AdaCat: Adaptive Categorical Discretization for Autoregressive Models [84.85102013917606]
We propose an efficient, expressive, multimodal parameterization called Adaptive Categorical Discretization (AdaCat)
AdaCat discretizes each dimension of an autoregressive model adaptively, which allows the model to allocate density to fine intervals of interest.
arXiv Detail & Related papers (2022-08-03T17:53:46Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Explainable AI Integrated Feature Selection for Landslide Susceptibility
Mapping using TreeSHAP [0.0]
An early prediction of landslide susceptibility using a data-driven approach is a demand of time.
We employed state-of-the-art machine learning algorithms including XgBoost, LR, KNN, SVM, and Adaboost for landslide susceptibility prediction.
An optimized version of XgBoost along with feature reduction by 40 % has outperformed all other classifiers in terms of popular evaluation metrics.
arXiv Detail & Related papers (2022-01-10T09:17:21Z) - Accelerated Componentwise Gradient Boosting using Efficient Data
Representation and Momentum-based Optimization [1.3159777131162964]
Componentwise boosting (CWB) builds on additive models as base learners to ensure interpretability.
One downside of CWB is its computational complexity in terms of memory and runtime.
We propose two techniques to overcome these issues without losing the properties of CWB.
arXiv Detail & Related papers (2021-10-07T14:49:52Z) - Gaussian Process Boosting [13.162429430481982]
We introduce a novel way to combine boosting with Gaussian process and mixed effects models.
We obtain increased prediction accuracy compared to existing approaches on simulated and real-world data sets.
arXiv Detail & Related papers (2020-04-06T13:19:54Z) - CatBoostLSS -- An extension of CatBoost to probabilistic forecasting [91.3755431537592]
We propose a new framework that predicts the entire conditional distribution of a univariable response variable.
CatBoostLSS models all moments of a parametric distribution instead of the conditional mean only.
We present both a simulation study and real-world examples that demonstrate the benefits of our approach.
arXiv Detail & Related papers (2020-01-04T15:42:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.