Metrics to guide development of machine learning algorithms for malaria
diagnosis
- URL: http://arxiv.org/abs/2209.06947v2
- Date: Tue, 4 Jul 2023 01:23:29 GMT
- Title: Metrics to guide development of machine learning algorithms for malaria
diagnosis
- Authors: Charles B. Delahunt, Noni Gachuhi, Matthew P. Horning
- Abstract summary: Automated malaria diagnosis is a difficult but high-value target for machine learning (ML)
Current ML efforts largely neglect crucial use case constraints and are thus not clinically useful.
- Score: 1.2891210250935143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated malaria diagnosis is a difficult but high-value target for machine
learning (ML), and effective algorithms could save many thousands of children's
lives. However, current ML efforts largely neglect crucial use case constraints
and are thus not clinically useful. Two factors in particular are crucial to
developing algorithms translatable to clinical field settings: (i) Clear
understanding of the clinical needs that ML solutions must accommodate; and
(ii) task-relevant metrics for guiding and evaluating ML models. Neglect of
these factors has seriously hampered past ML work on malaria, because the
resulting algorithms do not align with clinical needs.
In this paper we address these two issues in the context of automated malaria
diagnosis via microscopy on Giemsa-stained blood films. First, we describe why
domain expertise is crucial to effectively apply ML to malaria, and list
technical documents and other resources that provide this domain knowledge.
Second, we detail performance metrics tailored to the clinical requirements of
malaria diagnosis, to guide development of ML models and evaluate model
performance through the lens of clinical needs (versus a generic ML lens). We
highlight the importance of a patient-level perspective, interpatient
variability, false positive rates, limit of detection, and different types of
error. We also discuss reasons why ROC curves, AUC, and F1, as commonly used in
ML work, are poorly suited to this context. These findings also apply to other
diseases involving parasite loads, including neglected tropical diseases (NTDs)
such as schistosomiasis.
Related papers
- Assessing and Enhancing Large Language Models in Rare Disease Question-answering [64.32570472692187]
We introduce a rare disease question-answering (ReDis-QA) dataset to evaluate the performance of Large Language Models (LLMs) in diagnosing rare diseases.
We collected 1360 high-quality question-answer pairs within the ReDis-QA dataset, covering 205 rare diseases.
We then benchmarked several open-source LLMs, revealing that diagnosing rare diseases remains a significant challenge for these models.
Experiment results demonstrate that ReCOP can effectively improve the accuracy of LLMs on the ReDis-QA dataset by an average of 8%.
arXiv Detail & Related papers (2024-08-15T21:09:09Z) - Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)
Our research aims to transform existing medication recommendation methodologies using LLMs.
To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z) - The Significance of Machine Learning in Clinical Disease Diagnosis: A
Review [0.0]
This research investigates the capacity of machine learning algorithms to improve the transmission of heart rate data in time series healthcare metrics.
The factors under consideration include the algorithm utilized, the types of diseases targeted, the data types employed, the applications, and the evaluation metrics.
arXiv Detail & Related papers (2023-10-25T20:28:22Z) - Mixed-Integer Projections for Automated Data Correction of EMRs Improve
Predictions of Sepsis among Hospitalized Patients [7.639610349097473]
We introduce an innovative projections-based method that seamlessly integrates clinical expertise as domain constraints.
We measure the distance of corrected data from the constraints defining a healthy range of patient data, resulting in a unique predictive metric we term as "trust-scores"
We show an AUROC of 0.865 and a precision of 0.922, that surpasses conventional ML models without such projections.
arXiv Detail & Related papers (2023-08-21T15:14:49Z) - Artificial Intelligence for Dementia Research Methods Optimization [0.49050354212898845]
We present an overview of machine learning algorithms most frequently used in dementia research.
We discuss issues of replicability and interpretability and how these impact the clinical applicability of dementia research.
We give examples of how state-of-the-art methods, such as transfer learning, multi-task learning, and reinforcement learning, may be applied to overcome these issues.
arXiv Detail & Related papers (2023-03-02T08:50:25Z) - Detecting Shortcut Learning for Fair Medical AI using Shortcut Testing [62.9062883851246]
Machine learning holds great promise for improving healthcare, but it is critical to ensure that its use will not propagate or amplify health disparities.
One potential driver of algorithmic unfairness, shortcut learning, arises when ML models base predictions on improper correlations in the training data.
Using multi-task learning, we propose the first method to assess and mitigate shortcut learning as a part of the fairness assessment of clinical ML systems.
arXiv Detail & Related papers (2022-07-21T09:35:38Z) - Benchmarking Machine Learning Robustness in Covid-19 Genome Sequence
Classification [109.81283748940696]
We introduce several ways to perturb SARS-CoV-2 genome sequences to mimic the error profiles of common sequencing platforms such as Illumina and PacBio.
We show that some simulation-based approaches are more robust (and accurate) than others for specific embedding methods to certain adversarial attacks to the input sequences.
arXiv Detail & Related papers (2022-07-18T19:16:56Z) - Benchmarking Heterogeneous Treatment Effect Models through the Lens of
Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem.
Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools.
We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z) - Rethinking Machine Learning Model Evaluation in Pathology [3.0575251867964153]
We propose a set of practical guidelines for Machine Learning evaluation in pathology.
The paper includes measures for setting up the evaluation framework, effectively dealing with variability in labels, and a recommended suite of tests.
We hope that the proposed framework will bridge the gap between ML researchers and domain experts, leading to wider adoption of ML techniques in pathology.
arXiv Detail & Related papers (2022-04-11T15:49:12Z) - VBridge: Connecting the Dots Between Features, Explanations, and Data
for Healthcare Models [85.4333256782337]
VBridge is a visual analytics tool that seamlessly incorporates machine learning explanations into clinicians' decision-making workflow.
We identified three key challenges, including clinicians' unfamiliarity with ML features, lack of contextual information, and the need for cohort-level evidence.
We demonstrated the effectiveness of VBridge through two case studies and expert interviews with four clinicians.
arXiv Detail & Related papers (2021-08-04T17:34:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.