Use-Case-Grounded Simulations for Explanation Evaluation
- URL: http://arxiv.org/abs/2206.02256v1
- Date: Sun, 5 Jun 2022 20:12:19 GMT
- Title: Use-Case-Grounded Simulations for Explanation Evaluation
- Authors: Valerie Chen, Nari Johnson, Nicholay Topin, Gregory Plumb, Ameet
Talwalkar
- Abstract summary: We introduce Use-Case-Grounded Simulated Evaluations (SimEvals)
SimEvals involve training algorithmic agents that take as input the information content that would be presented to each participant in a human subject study.
We run a comprehensive evaluation on three real-world use cases to demonstrate that Simevals can effectively identify which explanation methods will help humans for each use case.
- Score: 23.584251632331046
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A growing body of research runs human subject evaluations to study whether
providing users with explanations of machine learning models can help them with
practical real-world use cases. However, running user studies is challenging
and costly, and consequently each study typically only evaluates a limited
number of different settings, e.g., studies often only evaluate a few
arbitrarily selected explanation methods. To address these challenges and aid
user study design, we introduce Use-Case-Grounded Simulated Evaluations
(SimEvals). SimEvals involve training algorithmic agents that take as input the
information content (such as model explanations) that would be presented to
each participant in a human subject study, to predict answers to the use case
of interest. The algorithmic agent's test set accuracy provides a measure of
the predictiveness of the information content for the downstream use case. We
run a comprehensive evaluation on three real-world use cases (forward
simulation, model debugging, and counterfactual reasoning) to demonstrate that
Simevals can effectively identify which explanation methods will help humans
for each use case. These results provide evidence that SimEvals can be used to
efficiently screen an important set of user study design decisions, e.g.
selecting which explanations should be presented to the user, before running a
potentially costly user study.
Related papers
- A Sim2Real Approach for Identifying Task-Relevant Properties in Interpretable Machine Learning [18.965568482077344]
We introduce a generalizable, cost-effective method for identifying task-relevant explanation properties in silico.
We use our approach to identify relevant proxies for three example tasks and validate our simulation with real user studies.
arXiv Detail & Related papers (2024-05-31T18:08:35Z) - BASES: Large-scale Web Search User Simulation with Large Language Model
based Agents [108.97507653131917]
BASES is a novel user simulation framework with large language models (LLMs)
Our simulation framework can generate unique user profiles at scale, which subsequently leads to diverse search behaviors.
WARRIORS is a new large-scale dataset encompassing web search user behaviors, including both Chinese and English versions.
arXiv Detail & Related papers (2024-02-27T13:44:09Z) - Evaluating the Utility of Model Explanations for Model Development [54.23538543168767]
We evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development.
To our surprise, we did not find evidence of significant improvement on tasks when users were provided with any of the saliency maps.
These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.
arXiv Detail & Related papers (2023-12-10T23:13:23Z) - UMSE: Unified Multi-scenario Summarization Evaluation [52.60867881867428]
Summarization quality evaluation is a non-trivial task in text summarization.
We propose Unified Multi-scenario Summarization Evaluation Model (UMSE)
Our UMSE is the first unified summarization evaluation framework engaged with the ability to be used in three evaluation scenarios.
arXiv Detail & Related papers (2023-05-26T12:54:44Z) - Designing Optimal Behavioral Experiments Using Machine Learning [8.759299724881219]
We provide a tutorial on leveraging recent advances in BOED and machine learning to find optimal experiments for any kind of model.
We consider theories of how people balance exploration and exploitation in multi-armed bandit decision-making tasks.
As compared to experimental designs commonly used in the literature, we show that our optimal designs more efficiently determine which of a set of models best account for individual human behavior.
arXiv Detail & Related papers (2023-05-12T18:24:30Z) - A Case Study on Designing Evaluations of ML Explanations with Simulated
User Studies [6.2511886555343805]
We conduct the first SimEvals on a real-world use case to evaluate whether explanations can better support ML-assisted decision-making in e-commerce fraud detection.
We find that SimEvals suggest that all considered explainers are equally performant, and none beat a baseline without explanations.
arXiv Detail & Related papers (2023-02-15T03:27:55Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Challenging common interpretability assumptions in feature attribution
explanations [0.0]
We empirically evaluate the veracity of three common interpretability assumptions through a large scale human-subjects experiment.
We find that feature attribution explanations provide marginal utility in our task for a human decision maker.
arXiv Detail & Related papers (2020-12-04T17:57:26Z) - A User's Guide to Calibrating Robotics Simulators [54.85241102329546]
This paper proposes a set of benchmarks and a framework for the study of various algorithms aimed to transfer models and policies learnt in simulation to the real world.
We conduct experiments on a wide range of well known simulated environments to characterize and offer insights into the performance of different algorithms.
Our analysis can be useful for practitioners working in this area and can help make informed choices about the behavior and main properties of sim-to-real algorithms.
arXiv Detail & Related papers (2020-11-17T22:24:26Z) - Evaluating Explainable AI: Which Algorithmic Explanations Help Users
Predict Model Behavior? [97.77183117452235]
We carry out human subject tests to isolate the effect of algorithmic explanations on model interpretability.
Clear evidence of method effectiveness is found in very few cases.
Our results provide the first reliable and comprehensive estimates of how explanations influence simulatability.
arXiv Detail & Related papers (2020-05-04T20:35:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.