Can Domain Experts Rely on AI Appropriately? A Case Study on AI-Assisted Prostate Cancer MRI Diagnosis
- URL: http://arxiv.org/abs/2502.03482v1
- Date: Mon, 03 Feb 2025 18:59:38 GMT
- Title: Can Domain Experts Rely on AI Appropriately? A Case Study on AI-Assisted Prostate Cancer MRI Diagnosis
- Authors: Chacha Chen, Han Liu, Jiamin Yang, Benjamin M. Mervak, Bora Kalaycioglu, Grace Lee, Emre Cakmakli, Matteo Bonatti, Sridhar Pudu, Osman Kahraman, Gul Gizem Pamuk, Aytekin Oto, Aritrick Chatterjee, Chenhao Tan,
- Abstract summary: We conduct an in-depth collaboration with radiologists in prostate cancer diagnosis based on MRI images.
We develop an interface and conduct two experiments to study how AI assistance and performance feedback shape the decision making of domain experts.
- Score: 19.73932120146401
- License:
- Abstract: Despite the growing interest in human-AI decision making, experimental studies with domain experts remain rare, largely due to the complexity of working with domain experts and the challenges in setting up realistic experiments. In this work, we conduct an in-depth collaboration with radiologists in prostate cancer diagnosis based on MRI images. Building on existing tools for teaching prostate cancer diagnosis, we develop an interface and conduct two experiments to study how AI assistance and performance feedback shape the decision making of domain experts. In Study 1, clinicians were asked to provide an initial diagnosis (human), then view the AI's prediction, and subsequently finalize their decision (human-AI team). In Study 2 (after a memory wash-out period), the same participants first received aggregated performance statistics from Study 1, specifically their own performance, the AI's performance, and their human-AI team performance, and then directly viewed the AI's prediction before making their diagnosis (i.e., no independent initial diagnosis). These two workflows represent realistic ways that clinical AI tools might be used in practice, where the second study simulates a scenario where doctors can adjust their reliance and trust on AI based on prior performance feedback. Our findings show that, while human-AI teams consistently outperform humans alone, they still underperform the AI due to under-reliance, similar to prior studies with crowdworkers. Providing clinicians with performance feedback did not significantly improve the performance of human-AI teams, although showing AI decisions in advance nudges people to follow AI more. Meanwhile, we observe that the ensemble of human-AI teams can outperform AI alone, suggesting promising directions for human-AI collaboration.
Related papers
- Raising the Stakes: Performance Pressure Improves AI-Assisted Decision Making [57.53469908423318]
We show the effects of performance pressure on AI advice reliance when laypeople complete a common AI-assisted task.
We find that when the stakes are high, people use AI advice more appropriately than when stakes are lower, regardless of the presence of an AI explanation.
arXiv Detail & Related papers (2024-10-21T22:39:52Z) - Improving Health Professionals' Onboarding with AI and XAI for Trustworthy Human-AI Collaborative Decision Making [3.2381492754749632]
We present the findings of semi-structured interviews with health professionals and students majoring in medicine and health.
For the interviews, we built upon human-AI interaction guidelines to create materials of an AI system for stroke rehabilitation assessment.
Our findings reveal that beyond presenting traditional performance metrics on AI, participants desired benchmark information.
arXiv Detail & Related papers (2024-05-26T04:30:17Z) - Explainable AI Enhances Glaucoma Referrals, Yet the Human-AI Team Still Falls Short of the AI Alone [6.740852152639975]
We investigate how various AI explanations help providers distinguish between patients needing immediate or non-urgent specialist referrals.
We built explainable AI algorithms to predict glaucoma surgery needs from routine eyecare data as a proxy for identifying high-risk patients.
We incorporated intrinsic and post-hoc explainability and conducted an online study with optometrists to assess human-AI team performance.
arXiv Detail & Related papers (2024-05-24T03:01:20Z) - Understanding the Effect of Counterfactual Explanations on Trust and
Reliance on AI for Human-AI Collaborative Clinical Decision Making [5.381004207943597]
We conducted an experiment with seven therapists and ten laypersons on the task of assessing post-stroke survivors' quality of motion.
We analyzed their performance, agreement level on the task, and reliance on AI without and with two types of AI explanations.
Our work discusses the potential of counterfactual explanations to better estimate the accuracy of an AI model and reduce over-reliance on wrong' AI outputs.
arXiv Detail & Related papers (2023-08-08T16:23:46Z) - BO-Muse: A human expert and AI teaming framework for accelerated
experimental design [58.61002520273518]
Our algorithm lets the human expert take the lead in the experimental process.
We show that our algorithm converges sub-linearly, at a rate faster than the AI or human alone.
arXiv Detail & Related papers (2023-03-03T02:56:05Z) - The Role of AI in Drug Discovery: Challenges, Opportunities, and
Strategies [97.5153823429076]
The benefits, challenges and drawbacks of AI in this field are reviewed.
The use of data augmentation, explainable AI, and the integration of AI with traditional experimental methods are also discussed.
arXiv Detail & Related papers (2022-12-08T23:23:39Z) - Advancing Human-AI Complementarity: The Impact of User Expertise and
Algorithmic Tuning on Joint Decision Making [10.890854857970488]
Many factors can impact success of Human-AI teams, including a user's domain expertise, mental models of an AI system, trust in recommendations, and more.
Our study examined user performance in a non-trivial blood vessel labeling task where participants indicated whether a given blood vessel was flowing or stalled.
Our results show that while recommendations from an AI-Assistant can aid user decision making, factors such as users' baseline performance relative to the AI and complementary tuning of AI error types significantly impact overall team performance.
arXiv Detail & Related papers (2022-08-16T21:39:58Z) - Who Goes First? Influences of Human-AI Workflow on Decision Making in
Clinical Imaging [24.911186503082465]
This study explores the effects of providing AI assistance at the start of a diagnostic session in radiology versus after the radiologist has made a provisional decision.
We found that participants who are asked to register provisional responses in advance of reviewing AI inferences are less likely to agree with the AI regardless of whether the advice is accurate and, in instances of disagreement with the AI, are less likely to seek the second opinion of a colleague.
arXiv Detail & Related papers (2022-05-19T16:59:25Z) - Cybertrust: From Explainable to Actionable and Interpretable AI (AI2) [58.981120701284816]
Actionable and Interpretable AI (AI2) will incorporate explicit quantifications and visualizations of user confidence in AI recommendations.
It will allow examining and testing of AI system predictions to establish a basis for trust in the systems' decision making.
arXiv Detail & Related papers (2022-01-26T18:53:09Z) - Is the Most Accurate AI the Best Teammate? Optimizing AI for Teamwork [54.309495231017344]
We argue that AI systems should be trained in a human-centered manner, directly optimized for team performance.
We study this proposal for a specific type of human-AI teaming, where the human overseer chooses to either accept the AI recommendation or solve the task themselves.
Our experiments with linear and non-linear models on real-world, high-stakes datasets show that the most accuracy AI may not lead to highest team performance.
arXiv Detail & Related papers (2020-04-27T19:06:28Z) - Effect of Confidence and Explanation on Accuracy and Trust Calibration
in AI-Assisted Decision Making [53.62514158534574]
We study whether features that reveal case-specific model information can calibrate trust and improve the joint performance of the human and AI.
We show that confidence score can help calibrate people's trust in an AI model, but trust calibration alone is not sufficient to improve AI-assisted decision making.
arXiv Detail & Related papers (2020-01-07T15:33:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.