Performance Evaluation of Regression Models in Predicting the Cost of
Medical Insurance
- URL: http://arxiv.org/abs/2304.12605v1
- Date: Tue, 25 Apr 2023 06:33:49 GMT
- Title: Performance Evaluation of Regression Models in Predicting the Cost of
Medical Insurance
- Authors: Jonelle Angelo S. Cenita, Paul Richie F. Asuncion, Jayson M.
Victoriano
- Abstract summary: Three (3) Regression Models in Machine Learning namely Linear Regression, Gradient Boosting, and Support Vector Machine were used.
The performance will be evaluated using the metrics RMSE (Root Mean Square), r2 (R Square), and K-Fold Cross-validation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The study aimed to evaluate the regression models' performance in predicting
the cost of medical insurance. The Three (3) Regression Models in Machine
Learning namely Linear Regression, Gradient Boosting, and Support Vector
Machine were used. The performance will be evaluated using the metrics RMSE
(Root Mean Square), r2 (R Square), and K-Fold Cross-validation. The study also
sought to pinpoint the feature that would be most important in predicting the
cost of medical insurance.The study is anchored on the knowledge discovery in
databases (KDD) process. (KDD) process refers to the overall process of
discovering useful knowledge from data. It show the performance evaluation
results reveal that among the three (3) Regression models, Gradient boosting
received the highest r2 (R Square) 0.892 and the lowest RMSE (Root Mean Square)
1336.594. Furthermore, the 10-Fold Cross-validation weighted mean findings are
not significantly different from the r2 (R Square) results of the three (3)
regression models. In addition, Exploratory Data Analysis (EDA) using a box
plot of descriptive statistics observed that in the charges and smoker features
the median of one group lies outside of the box of the other group, so there is
a difference between the two groups. It concludes that Gradient boosting
appears to perform better among the three (3) regression models. K-Fold
Cross-Validation concluded that the three (3) regression models are good.
Moreover, Exploratory Data Analysis (EDA) using a box plot of descriptive
statistics ceases that the highest charges are due to the smoker feature.
Related papers
- DUPRE: Data Utility Prediction for Efficient Data Valuation [49.60564885180563]
Cooperative game theory-based data valuation, such as Data Shapley, requires evaluating the data utility and retraining the ML model for multiple data subsets.
Our framework, textttDUPRE, takes an alternative yet complementary approach that reduces the cost per subset evaluation by predicting data utilities instead of evaluating them by model retraining.
Specifically, given the evaluated data utilities of some data subsets, textttDUPRE fits a emphGaussian process (GP) regression model to predict the utility of every other data subset.
arXiv Detail & Related papers (2025-02-22T08:53:39Z) - Demographic Predictability in 3D CT Foundation Embeddings [0.0]
Self-supervised foundation models have been successfully extended to encode 3D computed tomography (CT) images.
We evaluate whether these embeddings capture demographic information, such as age, sex, or race.
arXiv Detail & Related papers (2024-11-28T04:26:39Z) - Distribution Learning for Molecular Regression [10.96062816455682]
Distributional Mixture of Experts (DMoE) is a model-independent, and data-independent method for regression.
We evaluate the performance of DMoE on different molecular property prediction datasets.
arXiv Detail & Related papers (2024-07-30T00:21:51Z) - ZeroShape: Regression-based Zero-shot Shape Reconstruction [56.652766763775226]
We study the problem of single-image zero-shot 3D shape reconstruction.
Recent works learn zero-shot shape reconstruction through generative modeling of 3D assets.
We show that ZeroShape achieves superior performance over state-of-the-art methods.
arXiv Detail & Related papers (2023-12-21T01:56:34Z) - Machine Learning For An Explainable Cost Prediction of Medical Insurance [0.0]
Three regression-based ensemble ML models were deployed to predict medical insurance costs.
The models were evaluated using four performance evaluation metrics, including R-squared, Mean Absolute Error, Root Mean Squared Error, and Mean Absolute Percentage Error.
arXiv Detail & Related papers (2023-11-23T18:13:34Z) - The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease
detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation.
We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare.
Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z) - Reveal to Revise: An Explainable AI Life Cycle for Iterative Bias
Correction of Deep Models [11.879170124003252]
State-of-the-art machine learning models often learn spurious correlations embedded in the training data.
This poses risks when deploying these models for high-stake decision-making.
We propose Reveal to Revise (R2R) to identify, mitigate, and (re-)evaluate spurious model behavior.
arXiv Detail & Related papers (2023-03-22T15:23:09Z) - Rethinking Missing Data: Aleatoric Uncertainty-Aware Recommendation [59.500347564280204]
We propose a new Aleatoric Uncertainty-aware Recommendation (AUR) framework.
AUR consists of a new uncertainty estimator along with a normal recommender model.
As the chance of mislabeling reflects the potential of a pair, AUR makes recommendations according to the uncertainty.
arXiv Detail & Related papers (2022-09-22T04:32:51Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Human Pose Regression with Residual Log-likelihood Estimation [48.30425850653223]
We propose a novel regression paradigm with Residual Log-likelihood Estimation (RLE) to capture the underlying output distribution.
RLE learns the change of the distribution instead of the unreferenced underlying distribution to facilitate the training process.
Compared to the conventional regression paradigm, regression with RLE bring 12.4 mAP improvement on MSCOCO without any test-time overhead.
arXiv Detail & Related papers (2021-07-23T15:06:31Z) - Semiparametric count data regression for self-reported mental health [0.3553493344868413]
We design a semiparametric estimation and inference framework for count data regression.
The data-generating process is defined by simultaneously transforming and rounding (STAR) a latent Gaussian regression model.
STAR is deployed to study the factors associated with self-reported mental health and demonstrates substantial improvements in goodness-of-fit compared to existing count data regression models.
arXiv Detail & Related papers (2021-06-16T20:38:13Z) - Enhanced Doubly Robust Learning for Debiasing Post-click Conversion Rate
Estimation [29.27760413892272]
Post-click conversion, as a strong signal indicating the user preference, is salutary for building recommender systems.
Currently, most existing methods utilize counterfactual learning to debias recommender systems.
We propose a novel double learning approach for the MRDR estimator, which can convert the error imputation into the general CVR estimation.
arXiv Detail & Related papers (2021-05-28T06:59:49Z) - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing
Regressions In NLP Model Updates [68.09049111171862]
This work focuses on quantifying, reducing and analyzing regression errors in the NLP model updates.
We formulate the regression-free model updates into a constrained optimization problem.
We empirically analyze how model ensemble reduces regression.
arXiv Detail & Related papers (2021-05-07T03:33:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.