Related papers: To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems

To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems

URL: http://arxiv.org/abs/2409.14377v1
Date: Sun, 22 Sep 2024 09:43:27 GMT
Title: To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems
Authors: Gaole He, Abri Bharos, Ujwal Gadiraju,
Abstract summary: Vision for optimal human-AI collaboration requires 'appropriate reliance' of humans on AI systems. In practice, the performance disparity of machine learning models on out-of-distribution data makes dataset-specific performance feedback unreliable.
Score: 11.690126756498223
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Powerful predictive AI systems have demonstrated great potential in augmenting human decision making. Recent empirical work has argued that the vision for optimal human-AI collaboration requires 'appropriate reliance' of humans on AI systems. However, accurately estimating the trustworthiness of AI advice at the instance level is quite challenging, especially in the absence of performance feedback pertaining to the AI system. In practice, the performance disparity of machine learning models on out-of-distribution data makes the dataset-specific performance feedback unreliable in human-AI collaboration. Inspired by existing literature on critical thinking and a critical mindset, we propose the use of debugging an AI system as an intervention to foster appropriate reliance. In this paper, we explore whether a critical evaluation of AI performance within a debugging setting can better calibrate users' assessment of an AI system and lead to more appropriate reliance. Through a quantitative empirical study (N = 234), we found that our proposed debugging intervention does not work as expected in facilitating appropriate reliance. Instead, we observe a decrease in reliance on the AI system after the intervention -- potentially resulting from an early exposure to the AI system's weakness. We explore the dynamics of user confidence and user estimation of AI trustworthiness across groups with different performance levels to help explain how inappropriate reliance patterns occur. Our findings have important implications for designing effective interventions to facilitate appropriate reliance and better human-AI collaboration.

Related papers

The AI Imperative: Scaling High-Quality Peer Review in Machine Learning [49.87236114682497]
We argue that AI-assisted peer review must become an urgent research and infrastructure priority.<n>We propose specific roles for AI in enhancing factual verification, guiding reviewer performance, assisting authors in quality improvement, and supporting ACs in decision-making.
arXiv Detail & Related papers (2025-06-09T18:37:14Z)
When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration [79.69935257008467]
We introduce Knowledge Integration and Transfer Evaluation (KITE), a conceptual and experimental framework for Human-AI knowledge transfer capabilities.<n>We conduct the first large-scale human study (N=118) explicitly designed to measure it.<n>In our two-phase setup, humans first ideate with an AI on problem-solving strategies, then independently implement solutions, isolating model explanations' influence on human understanding.
arXiv Detail & Related papers (2025-06-05T20:48:16Z)
Human aversion? Do AI Agents Judge Identity More Harshly Than Performance [0.06554326244334868]
We investigate how AI agents based on large language models assess and integrate human input. We find that the AI system systematically discounts human advice, penalizing human errors more severely than algorithmic errors.
arXiv Detail & Related papers (2025-03-31T02:05:27Z)
On Benchmarking Human-Like Intelligence in Machines [77.55118048492021]
We argue that current AI evaluation paradigms are insufficient for assessing human-like cognitive capabilities. We identify a set of key shortcomings: a lack of human-validated labels, inadequate representation of human response variability and uncertainty, and reliance on simplified and ecologically-invalid tasks.
arXiv Detail & Related papers (2025-02-27T20:21:36Z)
Engaging with AI: How Interface Design Shapes Human-AI Collaboration in High-Stakes Decision-Making [8.948482790298645]
We examine how various decision-support mechanisms impact user engagement, trust, and human-AI collaborative task performance. Our findings reveal that mechanisms like AI confidence levels, text explanations, and performance visualizations enhanced human-AI collaborative task performance.
arXiv Detail & Related papers (2025-01-28T02:03:00Z)
Raising the Stakes: Performance Pressure Improves AI-Assisted Decision Making [57.53469908423318]
We show the effects of performance pressure on AI advice reliance when laypeople complete a common AI-assisted task. We find that when the stakes are high, people use AI advice more appropriately than when stakes are lower, regardless of the presence of an AI explanation.
arXiv Detail & Related papers (2024-10-21T22:39:52Z)
Negotiating the Shared Agency between Humans & AI in the Recommender System [1.4249472316161877]
Concerns about user agency have arisen due to the inherent opacity (information asymmetry) and the nature of one-way output (power asymmetry) on algorithms. We seek to understand how types of agency impact user perception and experience, and bring empirical evidence to refine the guidelines and designs for human-AI interactive systems.
arXiv Detail & Related papers (2024-03-23T19:23:08Z)
Beyond Recommender: An Exploratory Study of the Effects of Different AI Roles in AI-Assisted Decision Making [48.179458030691286]
We examine three AI roles: Recommender, Analyzer, and Devil's Advocate. Our results show each role's distinct strengths and limitations in task performance, reliance appropriateness, and user experience. These insights offer valuable implications for designing AI assistants with adaptive functional roles according to different situations.
arXiv Detail & Related papers (2024-03-04T07:32:28Z)
Testing autonomous vehicles and AI: perspectives and challenges from cybersecurity, transparency, robustness and fairness [53.91018508439669]
The study explores the complexities of integrating Artificial Intelligence into Autonomous Vehicles (AVs) It examines the challenges introduced by AI components and the impact on testing procedures. The paper identifies significant challenges and suggests future directions for research and development of AI in AV technology.
arXiv Detail & Related papers (2024-02-21T08:29:42Z)
Fairness in AI and Its Long-Term Implications on Society [68.8204255655161]
We take a closer look at AI fairness and analyze how lack of AI fairness can lead to deepening of biases over time. We discuss how biased models can lead to more negative real-world outcomes for certain groups. If the issues persist, they could be reinforced by interactions with other risks and have severe implications on society in the form of social unrest.
arXiv Detail & Related papers (2023-04-16T11:22:59Z)
Knowing About Knowing: An Illusion of Human Competence Can Hinder Appropriate Reliance on AI Systems [13.484359389266864]
This paper addresses whether the Dunning-Kruger Effect (DKE) can hinder appropriate reliance on AI systems. DKE is a metacognitive bias due to which less-competent individuals overestimate their own skill and performance. We found that participants who overestimate their performance tend to exhibit under-reliance on AI systems.
arXiv Detail & Related papers (2023-01-25T14:26:10Z)
Advancing Human-AI Complementarity: The Impact of User Expertise and Algorithmic Tuning on Joint Decision Making [10.890854857970488]
Many factors can impact success of Human-AI teams, including a user's domain expertise, mental models of an AI system, trust in recommendations, and more. Our study examined user performance in a non-trivial blood vessel labeling task where participants indicated whether a given blood vessel was flowing or stalled. Our results show that while recommendations from an AI-Assistant can aid user decision making, factors such as users' baseline performance relative to the AI and complementary tuning of AI error types significantly impact overall team performance.
arXiv Detail & Related papers (2022-08-16T21:39:58Z)
Cybertrust: From Explainable to Actionable and Interpretable AI (AI2) [58.981120701284816]
Actionable and Interpretable AI (AI2) will incorporate explicit quantifications and visualizations of user confidence in AI recommendations. It will allow examining and testing of AI system predictions to establish a basis for trust in the systems' decision making.
arXiv Detail & Related papers (2022-01-26T18:53:09Z)
Will We Trust What We Don't Understand? Impact of Model Interpretability and Outcome Feedback on Trust in AI [0.0]
We analyze the impact of interpretability and outcome feedback on trust in AI and on human performance in AI-assisted prediction tasks. We find that interpretability led to no robust improvements in trust, while outcome feedback had a significantly greater and more reliable effect.
arXiv Detail & Related papers (2021-11-16T04:35:34Z)
Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-Assisted Decision Making [53.62514158534574]
We study whether features that reveal case-specific model information can calibrate trust and improve the joint performance of the human and AI. We show that confidence score can help calibrate people's trust in an AI model, but trust calibration alone is not sufficient to improve AI-assisted decision making.
arXiv Detail & Related papers (2020-01-07T15:33:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.