Does AI help humans make better decisions? A statistical evaluation framework for experimental and observational studies
- URL: http://arxiv.org/abs/2403.12108v3
- Date: Fri, 11 Oct 2024 23:05:37 GMT
- Title: Does AI help humans make better decisions? A statistical evaluation framework for experimental and observational studies
- Authors: Eli Ben-Michael, D. James Greiner, Melody Huang, Kosuke Imai, Zhichao Jiang, Sooahn Shin,
- Abstract summary: We show how to compare the performance of three alternative decision-making systems--human-alone, human-with-AI, and AI-alone.
We find that the risk assessment recommendations do not improve the classification accuracy of a judge's decision to impose cash bail.
- Score: 0.43981305860983716
- License:
- Abstract: The use of Artificial Intelligence (AI), or more generally data-driven algorithms, has become ubiquitous in today's society. Yet, in many cases and especially when stakes are high, humans still make final decisions. The critical question, therefore, is whether AI helps humans make better decisions compared to a human-alone or AI-alone system. We introduce a new methodological framework to empirically answer this question with a minimal set of assumptions. We measure a decision maker's ability to make correct decisions using standard classification metrics based on the baseline potential outcome. We consider a single-blinded and unconfounded treatment assignment, where the provision of AI-generated recommendations is assumed to be randomized across cases with humans making final decisions. Under this study design, we show how to compare the performance of three alternative decision-making systems--human-alone, human-with-AI, and AI-alone. Importantly, the AI-alone system includes any individualized treatment assignment, including those that are not used in the original study. We also show when AI recommendations should be provided to a human-decision maker, and when one should follow such recommendations. We apply the proposed methodology to our own randomized controlled trial evaluating a pretrial risk assessment instrument. We find that the risk assessment recommendations do not improve the classification accuracy of a judge's decision to impose cash bail. Furthermore, we find that replacing a human judge with algorithms--the risk assessment score and a large language model in particular--leads to a worse classification performance.
Related papers
- Combining AI Control Systems and Human Decision Support via Robustness and Criticality [53.10194953873209]
We extend a methodology for adversarial explanations (AE) to state-of-the-art reinforcement learning frameworks.
We show that the learned AI control system demonstrates robustness against adversarial tampering.
In a training / learning framework, this technology can improve both the AI's decisions and explanations through human interaction.
arXiv Detail & Related papers (2024-07-03T15:38:57Z) - Online Decision Mediation [72.80902932543474]
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior.
In clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances.
arXiv Detail & Related papers (2023-10-28T05:59:43Z) - Learning to Make Adherence-Aware Advice [8.419688203654948]
This paper presents a sequential decision-making model that takes into account the human's adherence level.
We provide learning algorithms that learn the optimal advice policy and make advice only at critical time stamps.
arXiv Detail & Related papers (2023-10-01T23:15:55Z) - Using AI Uncertainty Quantification to Improve Human Decision-Making [14.878886078377562]
AI Uncertainty Quantification (UQ) has the potential to improve human decision-making beyond AI predictions alone.
We evaluated the impact on human decision-making for instance-level UQ, using a strict scoring rule, in two online behavioral experiments.
arXiv Detail & Related papers (2023-09-19T18:01:25Z) - Human-Centric Multimodal Machine Learning: Recent Advances and Testbed
on AI-based Recruitment [66.91538273487379]
There is a certain consensus about the need to develop AI applications with a Human-Centric approach.
Human-Centric Machine Learning needs to be developed based on four main requirements: (i) utility and social good; (ii) privacy and data ownership; (iii) transparency and accountability; and (iv) fairness in AI-driven decision-making processes.
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
arXiv Detail & Related papers (2023-02-13T16:44:44Z) - Algorithmic Assistance with Recommendation-Dependent Preferences [2.864550757598007]
We consider the effect and design of algorithmic recommendations when they affect choices.
We show that recommendation-dependent preferences create inefficiencies where the decision-maker is overly responsive to the recommendation.
arXiv Detail & Related papers (2022-08-16T09:24:47Z) - Randomized Classifiers vs Human Decision-Makers: Trustworthy AI May Have
to Act Randomly and Society Seems to Accept This [0.8889304968879161]
We feel that akin to human decisions, judgments of artificial agents should necessarily be grounded in some moral principles.
Yet a decision-maker can only make truly ethical (based on any ethical theory) and fair (according to any notion of fairness) decisions if full information on all the relevant factors on which the decision is based are available at the time of decision-making.
arXiv Detail & Related papers (2021-11-15T05:39:02Z) - Indecision Modeling [50.00689136829134]
It is important that AI systems act in ways which align with human values.
People are often indecisive, and especially so when their decision has moral implications.
arXiv Detail & Related papers (2020-12-15T18:32:37Z) - A Case for Humans-in-the-Loop: Decisions in the Presence of Erroneous
Algorithmic Scores [85.12096045419686]
We study the adoption of an algorithmic tool used to assist child maltreatment hotline screening decisions.
We first show that humans do alter their behavior when the tool is deployed.
We show that humans are less likely to adhere to the machine's recommendation when the score displayed is an incorrect estimate of risk.
arXiv Detail & Related papers (2020-02-19T07:27:32Z) - Effect of Confidence and Explanation on Accuracy and Trust Calibration
in AI-Assisted Decision Making [53.62514158534574]
We study whether features that reveal case-specific model information can calibrate trust and improve the joint performance of the human and AI.
We show that confidence score can help calibrate people's trust in an AI model, but trust calibration alone is not sufficient to improve AI-assisted decision making.
arXiv Detail & Related papers (2020-01-07T15:33:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.