Comparison of Machine Learning Classification Algorithms and Application
to the Framingham Heart Study
- URL: http://arxiv.org/abs/2402.15005v1
- Date: Thu, 22 Feb 2024 22:49:35 GMT
- Title: Comparison of Machine Learning Classification Algorithms and Application
to the Framingham Heart Study
- Authors: Nabil Kahouadji
- Abstract summary: The use of machine learning algorithms in healthcare can amplify social injustices and health inequities.
This research pertains to some generalizability impediments that occur during the development and the post-deployment of machine learning classification algorithms.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The use of machine learning algorithms in healthcare can amplify social
injustices and health inequities. While the exacerbation of biases can occur
and compound during the problem selection, data collection, and outcome
definition, this research pertains to some generalizability impediments that
occur during the development and the post-deployment of machine learning
classification algorithms. Using the Framingham coronary heart disease data as
a case study, we show how to effectively select a probability cutoff to convert
a regression model for a dichotomous variable into a classifier. We then
compare the sampling distribution of the predictive performance of eight
machine learning classification algorithms under four training/testing
scenarios to test their generalizability and their potential to perpetuate
biases. We show that both the Extreme Gradient Boosting, and Support Vector
Machine are flawed when trained on an unbalanced dataset. We introduced and
show that the double discriminant scoring of type I is the most generalizable
as it consistently outperforms the other classification algorithms regardless
of the training/testing scenario. Finally, we introduce a methodology to
extract an optimal variable hierarchy for a classification algorithm, and
illustrate it on the overall, male and female Framingham coronary heart disease
data.
Related papers
- The Impact of Ontology on the Prediction of Cardiovascular Disease Compared to Machine Learning Algorithms [0.0]
This paper compares and reviews the most prominent machine learning algorithms, as well as the ontology-based Machine Learning classification.
The findings are assessed using performance measures generated from the confusion matrix, such as F-Measure, Accuracy, Recall, and Precision.
arXiv Detail & Related papers (2024-05-30T18:40:27Z) - Multi-task Explainable Skin Lesion Classification [54.76511683427566]
We propose a few-shot-based approach for skin lesions that generalizes well with few labelled data.
The proposed approach comprises a fusion of a segmentation network that acts as an attention module and classification network.
arXiv Detail & Related papers (2023-10-11T05:49:47Z) - Anomaly Detection using Ensemble Classification and Evidence Theory [62.997667081978825]
We present a novel approach for novel detection using ensemble classification and evidence theory.
A pool selection strategy is presented to build a solid ensemble classifier.
We use uncertainty for the anomaly detection approach.
arXiv Detail & Related papers (2022-12-23T00:50:41Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Machine Learning-Based Classification Algorithms for the Prediction of
Coronary Heart Diseases [0.0]
The study created and tested several machine-learning-based classification models.
The results show that logistic regression produced the highest performance score on the original dataset.
In conclusion, this study suggests that LR on a well-processed and standardized dataset can predict coronary heart disease with greater accuracy than the other algorithms.
arXiv Detail & Related papers (2021-12-02T18:52:56Z) - Does Your Dermatology Classifier Know What It Doesn't Know? Detecting
the Long-Tail of Unseen Conditions [18.351120611713586]
We develop and rigorously evaluate a deep learning based system that can accurately classify skin conditions.
We frame this task as an out-of-distribution (OOD) detection problem.
Our novel approach, hierarchical outlier detection (HOD) assigns multiple abstention classes for each training class and jointly performs a coarse classification of inliers vs. outliers.
arXiv Detail & Related papers (2021-04-08T15:15:22Z) - Minimax Active Learning [61.729667575374606]
Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator.
Current active learning techniques either rely on model uncertainty to select the most uncertain samples or use clustering or reconstruction to choose the most diverse set of unlabeled examples.
We develop a semi-supervised minimax entropy-based active learning algorithm that leverages both uncertainty and diversity in an adversarial manner.
arXiv Detail & Related papers (2020-12-18T19:03:40Z) - Theoretical Insights Into Multiclass Classification: A High-dimensional
Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification.
Our analysis reveals that the classification accuracy is highly distribution-dependent.
The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Analysing Risk of Coronary Heart Disease through Discriminative Neural
Networks [18.124078832445967]
In critical applications like diagnostics, this class imbalance cannot be overlooked.
We depict how we can handle this class imbalance through neural networks using a discriminative model and contrastive loss.
arXiv Detail & Related papers (2020-06-17T06:30:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.