Related papers: A Bayesian Account of Measures of Interpretability in Human-AI Interaction

A Bayesian Account of Measures of Interpretability in Human-AI Interaction

URL: http://arxiv.org/abs/2011.10920v1
Date: Sun, 22 Nov 2020 03:28:28 GMT
Title: A Bayesian Account of Measures of Interpretability in Human-AI Interaction
Authors: Sarath Sreedharan, Anagha Kulkarni, Tathagata Chakraborti, David E. Smith and Subbarao Kambhampati
Abstract summary: Existing approaches for the design of interpretable agent behavior consider different measures of interpretability in isolation. We propose a revised model where all these behaviors can be meaningfully modeled together. We will highlight interesting consequences of this unified model and motivate, through results of a user study.
Score: 34.99424576619341
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing approaches for the design of interpretable agent behavior consider different measures of interpretability in isolation. In this paper we posit that, in the design and deployment of human-aware agents in the real world, notions of interpretability are just some among many considerations; and the techniques developed in isolation lack two key properties to be useful when considered together: they need to be able to 1) deal with their mutually competing properties; and 2) an open world where the human is not just there to interpret behavior in one specific form. To this end, we consider three well-known instances of interpretable behavior studied in existing literature -- namely, explicability, legibility, and predictability -- and propose a revised model where all these behaviors can be meaningfully modeled together. We will highlight interesting consequences of this unified model and motivate, through results of a user study, why this revision is necessary.

Related papers

Rationales Are Not Silver Bullets: Measuring the Impact of Rationales on Model Performance and Reliability [70.4107059502882]
Training language models with rationales augmentation has been shown to be beneficial in many existing works.<n>We conduct comprehensive investigations to thoroughly inspect the impact of rationales on model performance.
arXiv Detail & Related papers (2025-05-30T02:39:37Z)
A constraints-based approach to fully interpretable neural networks for detecting learner behaviors [0.6138671548064356]
We describe a novel approach to creating a neural-network-based behavior detection model that is interpretable by design.<n>Our model is fully interpretable, meaning that the parameters we extract for our explanations have a clear interpretation.<n>We train the model to detect gaming-the-system behavior, evaluate its performance on this task, and compare its learned patterns to those identified by human experts.
arXiv Detail & Related papers (2025-04-10T16:58:11Z)
On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [49.60774626839712]
multimodal generative models have sparked critical discussions on their fairness, reliability, and potential for misuse. We propose an evaluation framework designed to assess model reliability through their responses to perturbations in the embedding space. Our method lays the groundwork for detecting unreliable, bias-injected models and retrieval of bias provenance.
arXiv Detail & Related papers (2024-11-21T09:46:55Z)
Predictive Churn with the Set of Good Models [61.00058053669447]
This paper explores connections between two seemingly unrelated concepts of predictive inconsistency. The first, known as predictive multiplicity, occurs when models that perform similarly produce conflicting predictions for individual samples. The second concept, predictive churn, examines the differences in individual predictions before and after model updates.
arXiv Detail & Related papers (2024-02-12T16:15:25Z)
InterpretCC: Intrinsic User-Centric Interpretability through Global Mixture of Experts [31.738009841932374]
Interpretability for neural networks is a trade-off between three key requirements. We present InterpretCC, a family of interpretable-by-design neural networks that guarantee human-centric interpretability.
arXiv Detail & Related papers (2024-02-05T11:55:50Z)
Decoding Susceptibility: Modeling Misbelief to Misinformation Through a Computational Approach [61.04606493712002]
Susceptibility to misinformation describes the degree of belief in unverifiable claims that is not observable. Existing susceptibility studies heavily rely on self-reported beliefs. We propose a computational approach to model users' latent susceptibility levels.
arXiv Detail & Related papers (2023-11-16T07:22:56Z)
Inherent Inconsistencies of Feature Importance [6.02357145653815]
Feature importance is a method that assigns scores to the contribution of individual features on prediction outcomes. This paper presents an axiomatic framework designed to establish coherent relationships among the different contexts of feature importance scores.
arXiv Detail & Related papers (2022-06-16T14:21:51Z)
Exploring the Trade-off between Plausibility, Change Intensity and Adversarial Power in Counterfactual Explanations using Multi-objective Optimization [73.89239820192894]
We argue that automated counterfactual generation should regard several aspects of the produced adversarial instances. We present a novel framework for the generation of counterfactual examples.
arXiv Detail & Related papers (2022-05-20T15:02:53Z)
Cross-model Fairness: Empirical Study of Fairness and Ethics Under Model Multiplicity [10.144058870887061]
We argue that individuals can be harmed when one predictor is chosen ad hoc from a group of equally well performing models. Our findings suggest that such unfairness can be readily found in real life and it may be difficult to mitigate by technical means alone.
arXiv Detail & Related papers (2022-03-14T14:33:39Z)
Empirical Estimates on Hand Manipulation are Recoverable: A Step Towards Individualized and Explainable Robotic Support in Everyday Activities [80.37857025201036]
Key challenge for robotic systems is to figure out the behavior of another agent. Processing correct inferences is especially challenging when (confounding) factors are not controlled experimentally. We propose equipping robots with the necessary tools to conduct observational studies on people.
arXiv Detail & Related papers (2022-01-27T22:15:56Z)
A Unifying Bayesian Formulation of Measures of Interpretability in Human-AI [25.239891076153025]
We present a unifying Bayesian framework that models a human observer's evolving beliefs about an agent. We show that the definitions of interpretability measures like explicability, legibility and predictability from the prior literature fall out as special cases of our general framework.
arXiv Detail & Related papers (2021-04-21T20:06:33Z)
Machine Common Sense [77.34726150561087]
Machine common sense remains a broad, potentially unbounded problem in artificial intelligence (AI) This article deals with the aspects of modeling commonsense reasoning focusing on such domain as interpersonal interactions.
arXiv Detail & Related papers (2020-06-15T13:59:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.