Related papers: Performance Evaluation of Regression Models in Predicting the Cost of Medical Insurance

Performance Evaluation of Regression Models in Predicting the Cost of Medical Insurance

URL: http://arxiv.org/abs/2304.12605v1
Date: Tue, 25 Apr 2023 06:33:49 GMT
Title: Performance Evaluation of Regression Models in Predicting the Cost of Medical Insurance
Authors: Jonelle Angelo S. Cenita, Paul Richie F. Asuncion, Jayson M. Victoriano
Abstract summary: Three (3) Regression Models in Machine Learning namely Linear Regression, Gradient Boosting, and Support Vector Machine were used. The performance will be evaluated using the metrics RMSE (Root Mean Square), r2 (R Square), and K-Fold Cross-validation.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The study aimed to evaluate the regression models' performance in predicting the cost of medical insurance. The Three (3) Regression Models in Machine Learning namely Linear Regression, Gradient Boosting, and Support Vector Machine were used. The performance will be evaluated using the metrics RMSE (Root Mean Square), r2 (R Square), and K-Fold Cross-validation. The study also sought to pinpoint the feature that would be most important in predicting the cost of medical insurance.The study is anchored on the knowledge discovery in databases (KDD) process. (KDD) process refers to the overall process of discovering useful knowledge from data. It show the performance evaluation results reveal that among the three (3) Regression models, Gradient boosting received the highest r2 (R Square) 0.892 and the lowest RMSE (Root Mean Square) 1336.594. Furthermore, the 10-Fold Cross-validation weighted mean findings are not significantly different from the r2 (R Square) results of the three (3) regression models. In addition, Exploratory Data Analysis (EDA) using a box plot of descriptive statistics observed that in the charges and smoker features the median of one group lies outside of the box of the other group, so there is a difference between the two groups. It concludes that Gradient boosting appears to perform better among the three (3) regression models. K-Fold Cross-Validation concluded that the three (3) regression models are good. Moreover, Exploratory Data Analysis (EDA) using a box plot of descriptive statistics ceases that the highest charges are due to the smoker feature.

Related papers

DUPRE: Data Utility Prediction for Efficient Data Valuation [49.60564885180563]
Cooperative game theory-based data valuation, such as Data Shapley, requires evaluating the data utility and retraining the ML model for multiple data subsets. Our framework, textttDUPRE, takes an alternative yet complementary approach that reduces the cost per subset evaluation by predicting data utilities instead of evaluating them by model retraining. Specifically, given the evaluated data utilities of some data subsets, textttDUPRE fits a emphGaussian process (GP) regression model to predict the utility of every other data subset.
arXiv Detail & Related papers (2025-02-22T08:53:39Z)
Demographic Predictability in 3D CT Foundation Embeddings [0.0]
Self-supervised foundation models have been successfully extended to encode 3D computed tomography (CT) images. We evaluate whether these embeddings capture demographic information, such as age, sex, or race.
arXiv Detail & Related papers (2024-11-28T04:26:39Z)
Distribution Learning for Molecular Regression [10.96062816455682]
Distributional Mixture of Experts (DMoE) is a model-independent, and data-independent method for regression. We evaluate the performance of DMoE on different molecular property prediction datasets.
arXiv Detail & Related papers (2024-07-30T00:21:51Z)
ZeroShape: Regression-based Zero-shot Shape Reconstruction [56.652766763775226]
We study the problem of single-image zero-shot 3D shape reconstruction. Recent works learn zero-shot shape reconstruction through generative modeling of 3D assets. We show that ZeroShape achieves superior performance over state-of-the-art methods.
arXiv Detail & Related papers (2023-12-21T01:56:34Z)
Machine Learning For An Explainable Cost Prediction of Medical Insurance [0.0]
Three regression-based ensemble ML models were deployed to predict medical insurance costs. The models were evaluated using four performance evaluation metrics, including R-squared, Mean Absolute Error, Root Mean Squared Error, and Mean Absolute Percentage Error.
arXiv Detail & Related papers (2023-11-23T18:13:34Z)
The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation. We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare. Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z)
Reveal to Revise: An Explainable AI Life Cycle for Iterative Bias Correction of Deep Models [11.879170124003252]
State-of-the-art machine learning models often learn spurious correlations embedded in the training data. This poses risks when deploying these models for high-stake decision-making. We propose Reveal to Revise (R2R) to identify, mitigate, and (re-)evaluate spurious model behavior.
arXiv Detail & Related papers (2023-03-22T15:23:09Z)
Rethinking Missing Data: Aleatoric Uncertainty-Aware Recommendation [59.500347564280204]
We propose a new Aleatoric Uncertainty-aware Recommendation (AUR) framework. AUR consists of a new uncertainty estimator along with a normal recommender model. As the chance of mislabeling reflects the potential of a pair, AUR makes recommendations according to the uncertainty.
arXiv Detail & Related papers (2022-09-22T04:32:51Z)
X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning. To take the power of both worlds, we propose a novel X-model. X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z)
Human Pose Regression with Residual Log-likelihood Estimation [48.30425850653223]
We propose a novel regression paradigm with Residual Log-likelihood Estimation (RLE) to capture the underlying output distribution. RLE learns the change of the distribution instead of the unreferenced underlying distribution to facilitate the training process. Compared to the conventional regression paradigm, regression with RLE bring 12.4 mAP improvement on MSCOCO without any test-time overhead.
arXiv Detail & Related papers (2021-07-23T15:06:31Z)
Semiparametric count data regression for self-reported mental health [0.3553493344868413]
We design a semiparametric estimation and inference framework for count data regression. The data-generating process is defined by simultaneously transforming and rounding (STAR) a latent Gaussian regression model. STAR is deployed to study the factors associated with self-reported mental health and demonstrates substantial improvements in goodness-of-fit compared to existing count data regression models.
arXiv Detail & Related papers (2021-06-16T20:38:13Z)
Enhanced Doubly Robust Learning for Debiasing Post-click Conversion Rate Estimation [29.27760413892272]
Post-click conversion, as a strong signal indicating the user preference, is salutary for building recommender systems. Currently, most existing methods utilize counterfactual learning to debias recommender systems. We propose a novel double learning approach for the MRDR estimator, which can convert the error imputation into the general CVR estimation.
arXiv Detail & Related papers (2021-05-28T06:59:49Z)
Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates [68.09049111171862]
This work focuses on quantifying, reducing and analyzing regression errors in the NLP model updates. We formulate the regression-free model updates into a constrained optimization problem. We empirically analyze how model ensemble reduces regression.
arXiv Detail & Related papers (2021-05-07T03:33:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.