How to Measure Human-AI Prediction Accuracy in Explainable AI Systems
- URL: http://arxiv.org/abs/2409.00069v1
- Date: Fri, 23 Aug 2024 19:52:37 GMT
- Title: How to Measure Human-AI Prediction Accuracy in Explainable AI Systems
- Authors: Sujay Koujalgi, Andrew Anderson, Iyadunni Adenuga, Shikha Soneji, Rupika Dikkala, Teresita Guzman Nader, Leo Soccio, Sourav Panda, Rupak Kumar Das, Margaret Burnett, Jonathan Dodge,
- Abstract summary: In empirical studies with humans, an obvious approach is to frame the task as binary (i.e., prediction is either right or wrong)
The crux of the problem is that the binary framing is failing to capture the nuances of the different degrees of "wrongness"
We propose three mathematical bases upon which to measure "partial wrongness"
- Score: 1.9401464646154982
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Assessing an AI system's behavior-particularly in Explainable AI Systems-is sometimes done empirically, by measuring people's abilities to predict the agent's next move-but how to perform such measurements? In empirical studies with humans, an obvious approach is to frame the task as binary (i.e., prediction is either right or wrong), but this does not scale. As output spaces increase, so do floor effects, because the ratio of right answers to wrong answers quickly becomes very small. The crux of the problem is that the binary framing is failing to capture the nuances of the different degrees of "wrongness." To address this, we begin by proposing three mathematical bases upon which to measure "partial wrongness." We then uses these bases to perform two analyses on sequential decision-making domains: the first is an in-lab study with 86 participants on a size-36 action space; the second is a re-analysis of a prior study on a size-4 action space. Other researchers adopting our operationalization of the prediction task and analysis methodology will improve the rigor of user studies conducted with that task, which is particularly important when the domain features a large output space.
Related papers
- AI-Assisted Decision Making with Human Learning [8.598431584462944]
In many cases, despite the algorithm's superior performance, the final decision remains in human hands.
This paper studies such AI-assisted decision-making settings, where the human learns through repeated interactions with the algorithm.
We observe that the discrepancy between the algorithm's model and the human's model creates a fundamental tradeoff.
arXiv Detail & Related papers (2025-02-18T17:08:21Z) - Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs [76.43407125275202]
o1-like models can emulate human-like long-time thinking during inference.
This paper presents the first comprehensive study on the prevalent issue of overthinking in these models.
We propose strategies to mitigate overthinking, streamlining reasoning processes without compromising accuracy.
arXiv Detail & Related papers (2024-12-30T18:55:12Z) - Addressing and Visualizing Misalignments in Human Task-Solving Trajectories [5.166083532861163]
This study categorizes such misalignments into three types: (1) lack of functions to express intent, (2) inefficient action sequences, and (3) incorrect intentions that cannot solve the task.<n>We experimentally demonstrate that AI models trained on human task-solving trajectories improve performance in mimicking human reasoning.
arXiv Detail & Related papers (2024-09-21T16:38:22Z) - The Relative Value of Prediction in Algorithmic Decision Making [0.0]
We ask: What is the relative value of prediction in algorithmic decision making?
We identify simple, sharp conditions determining the relative value of prediction vis-a-vis expanding access.
We illustrate how these theoretical insights may be used to guide the design of algorithmic decision making systems in practice.
arXiv Detail & Related papers (2023-12-13T20:52:45Z) - What Should I Know? Using Meta-gradient Descent for Predictive Feature
Discovery in a Single Stream of Experience [63.75363908696257]
computational reinforcement learning seeks to construct an agent's perception of the world through predictions of future sensations.
An open challenge in this line of work is determining from the infinitely many predictions that the agent could possibly make which predictions might best support decision-making.
We introduce a meta-gradient descent process by which an agent learns what predictions to make, 2) the estimates for its chosen predictions, and 3) how to use those estimates to generate policies that maximize future reward.
arXiv Detail & Related papers (2022-06-13T21:31:06Z) - Human-Algorithm Collaboration: Achieving Complementarity and Avoiding
Unfairness [92.26039686430204]
We show that even in carefully-designed systems, complementary performance can be elusive.
First, we provide a theoretical framework for modeling simple human-algorithm systems.
Next, we use this model to prove conditions where complementarity is impossible.
arXiv Detail & Related papers (2022-02-17T18:44:41Z) - Empirical Estimates on Hand Manipulation are Recoverable: A Step Towards
Individualized and Explainable Robotic Support in Everyday Activities [80.37857025201036]
Key challenge for robotic systems is to figure out the behavior of another agent.
Processing correct inferences is especially challenging when (confounding) factors are not controlled experimentally.
We propose equipping robots with the necessary tools to conduct observational studies on people.
arXiv Detail & Related papers (2022-01-27T22:15:56Z) - Investigating Pose Representations and Motion Contexts Modeling for 3D
Motion Prediction [63.62263239934777]
We conduct an indepth study on various pose representations with a focus on their effects on the motion prediction task.
We propose a novel RNN architecture termed AHMR (Attentive Hierarchical Motion Recurrent network) for motion prediction.
Our approach outperforms the state-of-the-art methods in short-term prediction and achieves much enhanced long-term prediction proficiency.
arXiv Detail & Related papers (2021-12-30T10:45:22Z) - Role of Human-AI Interaction in Selective Prediction [20.11364033416315]
We study the impact of communicating different types of information to humans about the AI system's decision to defer.
We show that it is possible to significantly boost human performance by informing the human of the decision to defer, but not revealing the prediction of the AI.
arXiv Detail & Related papers (2021-12-13T16:03:13Z) - Development of Human Motion Prediction Strategy using Inception Residual
Block [1.0705399532413613]
We propose an Inception Residual Block (IRB) to detect temporal features in human poses.
Our main contribution is to propose a residual connection between input and the output of the inception block to have a continuity between the previously observed pose and the next predicted pose.
With this proposed architecture, it learns prior knowledge much better about human poses and we achieve much higher prediction accuracy as detailed in the paper.
arXiv Detail & Related papers (2021-08-09T12:49:48Z) - Challenging common interpretability assumptions in feature attribution
explanations [0.0]
We empirically evaluate the veracity of three common interpretability assumptions through a large scale human-subjects experiment.
We find that feature attribution explanations provide marginal utility in our task for a human decision maker.
arXiv Detail & Related papers (2020-12-04T17:57:26Z) - Fast Uncertainty Quantification for Deep Object Pose Estimation [91.09217713805337]
Deep learning-based object pose estimators are often unreliable and overconfident.
In this work, we propose a simple, efficient, and plug-and-play UQ method for 6-DoF object pose estimation.
arXiv Detail & Related papers (2020-11-16T06:51:55Z) - Beyond accuracy: quantifying trial-by-trial behaviour of CNNs and humans
by measuring error consistency [10.028543085687803]
A central problem in cognitive science and behavioural neuroscience is to ascertain whether two or more decision makers (be they brains or algorithms) use the same strategy.
We introduce trial-by-trial error consistency, a quantitative analysis for measuring whether two decision making systems systematically make errors on the same inputs.
arXiv Detail & Related papers (2020-06-30T12:47:17Z) - On Adversarial Examples and Stealth Attacks in Artificial Intelligence
Systems [62.997667081978825]
We present a formal framework for assessing and analyzing two classes of malevolent action towards generic Artificial Intelligence (AI) systems.
The first class involves adversarial examples and concerns the introduction of small perturbations of the input data that cause misclassification.
The second class, introduced here for the first time and named stealth attacks, involves small perturbations to the AI system itself.
arXiv Detail & Related papers (2020-04-09T10:56:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.