Machine Learning For An Explainable Cost Prediction of Medical Insurance
- URL: http://arxiv.org/abs/2311.14139v1
- Date: Thu, 23 Nov 2023 18:13:34 GMT
- Title: Machine Learning For An Explainable Cost Prediction of Medical Insurance
- Authors: Ugochukwu Orji and Elochukwu Ukwandu
- Abstract summary: Three regression-based ensemble ML models were deployed to predict medical insurance costs.
The models were evaluated using four performance evaluation metrics, including R-squared, Mean Absolute Error, Root Mean Squared Error, and Mean Absolute Percentage Error.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Predictive modeling in healthcare continues to be an active actuarial
research topic as more insurance companies aim to maximize the potential of
Machine Learning approaches to increase their productivity and efficiency. In
this paper, the authors deployed three regression-based ensemble ML models that
combine variations of decision trees through Extreme Gradient Boosting,
Gradient-boosting Machine, and Random Forest) methods in predicting medical
insurance costs. Explainable Artificial Intelligence methods SHapley Additive
exPlanations and Individual Conditional Expectation plots were deployed to
discover and explain the key determinant factors that influence medical
insurance premium prices in the dataset. The dataset used comprised 986 records
and is publicly available in the KAGGLE repository. The models were evaluated
using four performance evaluation metrics, including R-squared, Mean Absolute
Error, Root Mean Squared Error, and Mean Absolute Percentage Error. The results
show that all models produced impressive outcomes; however, the XGBoost model
achieved a better overall performance although it also expanded more
computational resources, while the RF model recorded a lesser prediction error
and consumed far fewer computing resources than the XGBoost model. Furthermore,
we compared the outcome of both XAi methods in identifying the key determinant
features that influenced the PremiumPrices for each model and whereas both XAi
methods produced similar outcomes, we found that the ICE plots showed in more
detail the interactions between each variable than the SHAP analysis which
seemed to be more high-level. It is the aim of the authors that the
contributions of this study will help policymakers, insurers, and potential
medical insurance buyers in their decision-making process for selecting the
right policies that meet their specific needs.
Related papers
- Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search [59.75749613951193]
We propose Data Influence-oriented Tree Search (DITS) to guide both tree search and data selection.
By leveraging influence scores, we effectively identify the most impactful data for system improvement.
We derive influence score estimation methods tailored for non-differentiable metrics.
arXiv Detail & Related papers (2025-02-02T23:20:16Z) - Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework [77.45983464131977]
We focus on how likely it is that a RAG model's prediction is incorrect, resulting in uncontrollable risks in real-world applications.
Our research identifies two critical latent factors affecting RAG's confidence in its predictions.
We develop a counterfactual prompting framework that induces the models to alter these factors and analyzes the effect on their answers.
arXiv Detail & Related papers (2024-09-24T14:52:14Z) - Two new feature selection methods based on learn-heuristic techniques for breast cancer prediction: A comprehensive analysis [6.796017024594715]
We suggest two novel feature selection (FS) methods based upon an imperialist competitive algorithm (ICA) and a bat algorithm (BA)
This study aims to enhance diagnostic models' efficiency and present a comprehensive analysis to help clinical physicians make much more precise and reliable decisions than before.
arXiv Detail & Related papers (2024-07-19T19:07:53Z) - Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data [44.99833362998488]
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options.
The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app.
arXiv Detail & Related papers (2024-07-10T19:17:23Z) - A comparative study on feature selection for a risk prediction model for
colorectal cancer [0.0]
This work is focused on colorectal cancer, assessing several feature ranking algorithms in terms of performance for a set of risk prediction models.
A visual approach proposed in this work allows to see that the Neural Network-based wrapper ranking is the most unstable while the Random Forest is the most stable.
arXiv Detail & Related papers (2024-02-07T22:14:14Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Explainable AI for Malnutrition Risk Prediction from m-Health and
Clinical Data [3.093890460224435]
This paper presents a novel AI framework for early and explainable malnutrition risk detection based on heterogeneous m-health data.
We performed an extensive model evaluation including both subject-independent and personalised predictions.
We also investigated several benchmark XAI methods to extract global model explanations.
arXiv Detail & Related papers (2023-05-31T08:07:35Z) - Performance Evaluation of Regression Models in Predicting the Cost of
Medical Insurance [0.0]
Three (3) Regression Models in Machine Learning namely Linear Regression, Gradient Boosting, and Support Vector Machine were used.
The performance will be evaluated using the metrics RMSE (Root Mean Square), r2 (R Square), and K-Fold Cross-validation.
arXiv Detail & Related papers (2023-04-25T06:33:49Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Characterizing Fairness Over the Set of Good Models Under Selective
Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance.
We provide tractable algorithms to compute the range of attainable group-level predictive disparities.
We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.