Statistical Tests for Replacing Human Decision Makers with Algorithms
- URL: http://arxiv.org/abs/2306.11689v1
- Date: Tue, 20 Jun 2023 17:09:04 GMT
- Title: Statistical Tests for Replacing Human Decision Makers with Algorithms
- Authors: Kai Feng, Han Hong, Ke Tang, Jingyuan Wang
- Abstract summary: The performance of each human decision maker is first benchmarked against machine predictions.
We then replace the decisions made by a subset of decision makers with the recommendation from the proposed artificial intelligence algorithm.
We find that our algorithm on a test dataset results in a higher overall true positive rate and a lower false positive rate than the diagnoses made by doctors only.
- Score: 32.877314377522524
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a statistical framework with which artificial
intelligence can improve human decision making. The performance of each human
decision maker is first benchmarked against machine predictions; we then
replace the decisions made by a subset of the decision makers with the
recommendation from the proposed artificial intelligence algorithm. Using a
large nationwide dataset of pregnancy outcomes and doctor diagnoses from
prepregnancy checkups of reproductive age couples, we experimented with both a
heuristic frequentist approach and a Bayesian posterior loss function approach
with an application to abnormal birth detection. We find that our algorithm on
a test dataset results in a higher overall true positive rate and a lower false
positive rate than the diagnoses made by doctors only. We also find that the
diagnoses of doctors from rural areas are more frequently replaceable,
suggesting that artificial intelligence assisted decision making tends to
improve precision more in less developed regions.
Related papers
- Two new feature selection methods based on learn-heuristic techniques for breast cancer prediction: A comprehensive analysis [6.796017024594715]
We suggest two novel feature selection (FS) methods based upon an imperialist competitive algorithm (ICA) and a bat algorithm (BA)
This study aims to enhance diagnostic models' efficiency and present a comprehensive analysis to help clinical physicians make much more precise and reliable decisions than before.
arXiv Detail & Related papers (2024-07-19T19:07:53Z) - A Survey of Artificial Intelligence in Gait-Based Neurodegenerative Disease Diagnosis [51.07114445705692]
neurodegenerative diseases (NDs) traditionally require extensive healthcare resources and human effort for medical diagnosis and monitoring.
As a crucial disease-related motor symptom, human gait can be exploited to characterize different NDs.
The current advances in artificial intelligence (AI) models enable automatic gait analysis for NDs identification and classification.
arXiv Detail & Related papers (2024-05-21T06:44:40Z) - The Limits of Fair Medical Imaging AI In The Wild [43.97266228706059]
We investigate the extent to which medical AI utilizes demographic encodings.
We confirm that medical imaging AI leverages demographic shortcuts in disease classification.
We find that models with less encoding of demographic attributes are often most "globally optimal"
arXiv Detail & Related papers (2023-12-11T18:59:50Z) - Auditing for Human Expertise [12.967730957018688]
We develop a statistical framework under which we can pose this question as a natural hypothesis test.
We propose a simple procedure which tests whether expert predictions are statistically independent from the outcomes of interest.
A rejection of our test thus suggests that human experts may add value to any algorithm trained on the available data.
arXiv Detail & Related papers (2023-06-02T16:15:24Z) - An Improved Model Ensembled of Different Hyper-parameter Tuned Machine
Learning Algorithms for Fetal Health Prediction [1.332560004325655]
We propose a robust ensemble model called ensemble of tuned Support Vector Machine and ExtraTrees for predicting fetal health.
Our proposed ETSE model outperformed the other models with 100% precision, 100% recall, 100% F1-score, and 99.66% accuracy.
arXiv Detail & Related papers (2023-05-26T16:40:44Z) - Artificial Intelligence Model for Tumoral Clinical Decision Support Systems [0.0]
Comparative diagnostic in brain tumor evaluation makes possible to use available information of a medical center to compare similar cases when a new patient is evaluated.
By leveraging Artificial Intelligence models, the proposed system is able of retrieving the most similar cases of brain tumors for a given query.
arXiv Detail & Related papers (2023-01-09T22:15:18Z) - Post-hoc loss-calibration for Bayesian neural networks [25.05373000435213]
We develop methods for correcting approximate posterior predictive distributions encouraging them to prefer high-utility decisions.
In contrast to previous work, our approach is agnostic to the choice of the approximate inference algorithm.
arXiv Detail & Related papers (2021-06-13T13:53:27Z) - The Medkit-Learn(ing) Environment: Medical Decision Modelling through
Simulation [81.72197368690031]
We present a new benchmarking suite designed specifically for medical sequential decision making.
The Medkit-Learn(ing) Environment is a publicly available Python package providing simple and easy access to high-fidelity synthetic medical data.
arXiv Detail & Related papers (2021-06-08T10:38:09Z) - Predicting Parkinson's Disease with Multimodal Irregularly Collected
Longitudinal Smartphone Data [75.23250968928578]
Parkinsons Disease is a neurological disorder and prevalent in elderly people.
Traditional ways to diagnose the disease rely on in-person subjective clinical evaluations on the quality of a set of activity tests.
We propose a novel time-series based approach to predicting Parkinson's Disease with raw activity test data collected by smartphones in the wild.
arXiv Detail & Related papers (2020-09-25T01:50:15Z) - A Case for Humans-in-the-Loop: Decisions in the Presence of Erroneous
Algorithmic Scores [85.12096045419686]
We study the adoption of an algorithmic tool used to assist child maltreatment hotline screening decisions.
We first show that humans do alter their behavior when the tool is deployed.
We show that humans are less likely to adhere to the machine's recommendation when the score displayed is an incorrect estimate of risk.
arXiv Detail & Related papers (2020-02-19T07:27:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.