Objective Metrics for Human-Subjects Evaluation in Explainable Reinforcement Learning
- URL: http://arxiv.org/abs/2501.19256v1
- Date: Fri, 31 Jan 2025 16:12:23 GMT
- Title: Objective Metrics for Human-Subjects Evaluation in Explainable Reinforcement Learning
- Authors: Balint Gyevnar, Mark Towers,
- Abstract summary: Explanation is a fundamentally human process. Understanding the goal and audience of the explanation is vital.
Existing work on explainable reinforcement learning (XRL) routinely does not consult humans in their evaluations.
This paper calls on researchers to use objective human metrics for explanation evaluations based on observable and actionable behaviour.
- Score: 0.47355466227925036
- License:
- Abstract: Explanation is a fundamentally human process. Understanding the goal and audience of the explanation is vital, yet existing work on explainable reinforcement learning (XRL) routinely does not consult humans in their evaluations. Even when they do, they routinely resort to subjective metrics, such as confidence or understanding, that can only inform researchers of users' opinions, not their practical effectiveness for a given problem. This paper calls on researchers to use objective human metrics for explanation evaluations based on observable and actionable behaviour to build more reproducible, comparable, and epistemically grounded research. To this end, we curate, describe, and compare several objective evaluation methodologies for applying explanations to debugging agent behaviour and supporting human-agent teaming, illustrating our proposed methods using a novel grid-based environment. We discuss how subjective and objective metrics complement each other to provide holistic validation and how future work needs to utilise standardised benchmarks for testing to enable greater comparisons between research.
Related papers
- Learning to Assist Humans without Inferring Rewards [65.28156318196397]
We build upon prior work that studies assistance through the lens of empowerment.
An assistive agent aims to maximize the influence of the human's actions.
We prove that these representations estimate a similar notion of empowerment to that studied by prior work.
arXiv Detail & Related papers (2024-11-04T21:31:04Z) - On Evaluating Explanation Utility for Human-AI Decision Making in NLP [39.58317527488534]
We review existing metrics suitable for application-grounded evaluation.
We demonstrate the importance of reassessing the state of the art to form and study human-AI teams.
arXiv Detail & Related papers (2024-07-03T23:53:27Z) - Evaluating the Utility of Model Explanations for Model Development [54.23538543168767]
We evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development.
To our surprise, we did not find evidence of significant improvement on tasks when users were provided with any of the saliency maps.
These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.
arXiv Detail & Related papers (2023-12-10T23:13:23Z) - Towards Objective Evaluation of Socially-Situated Conversational Robots:
Assessing Human-Likeness through Multimodal User Behaviors [26.003947740875482]
This paper focuses on assessing the human-likeness of the robot as the primary evaluation metric.
Our approach aims to evaluate the robot's human-likeness based on observable user behaviors indirectly, thus enhancing objectivity and objectivity.
arXiv Detail & Related papers (2023-08-21T20:21:07Z) - Provable Benefits of Policy Learning from Human Preferences in
Contextual Bandit Problems [82.92678837778358]
preference-based methods have demonstrated substantial success in empirical applications such as InstructGPT.
We show how human bias and uncertainty in feedback modelings can affect the theoretical guarantees of these approaches.
arXiv Detail & Related papers (2023-07-24T17:50:24Z) - Learning and Evaluating Human Preferences for Conversational Head
Generation [101.89332968344102]
We propose a novel learning-based evaluation metric named Preference Score (PS) for fitting human preference according to the quantitative evaluations across different dimensions.
PS can serve as a quantitative evaluation without the need for human annotation.
arXiv Detail & Related papers (2023-07-20T07:04:16Z) - ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning [63.77667876176978]
Large language models show improved downstream task interpretability when prompted to generate step-by-step reasoning to justify their final answers.
These reasoning steps greatly improve model interpretability and verification, but objectively studying their correctness is difficult.
We present ROS, a suite of interpretable, unsupervised automatic scores that improve and extend previous text generation evaluation metrics.
arXiv Detail & Related papers (2022-12-15T15:52:39Z) - Counterfactually Evaluating Explanations in Recommender Systems [14.938252589829673]
We propose an offline evaluation method that can be computed without human involvement.
We show that, compared to conventional methods, our method can produce evaluation scores more correlated with the real human judgments.
arXiv Detail & Related papers (2022-03-02T18:55:29Z) - HIVE: Evaluating the Human Interpretability of Visual Explanations [20.060507122989645]
We propose a novel human evaluation framework HIVE (Human Interpretability of Visual Explanations) for diverse interpretability methods in computer vision.
Our results suggest that explanations (regardless of if they are actually correct) engender human trust, yet are not distinct enough for users to distinguish between correct and incorrect predictions.
arXiv Detail & Related papers (2021-12-06T17:30:47Z) - On the Interaction of Belief Bias and Explanations [4.211128681972148]
We provide an overview of belief bias, its role in human evaluation, and ideas for NLP practitioners on how to account for it.
We show that conclusions about the highest performing methods change when introducing such controls, pointing to the importance of accounting for belief bias in evaluation.
arXiv Detail & Related papers (2021-06-29T12:49:42Z) - Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy
Evaluation Approach [84.02388020258141]
We propose a new framework named ENIGMA for estimating human evaluation scores based on off-policy evaluation in reinforcement learning.
ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation.
Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.
arXiv Detail & Related papers (2021-02-20T03:29:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.