A Case Study on Designing Evaluations of ML Explanations with Simulated
  User Studies
        - URL: http://arxiv.org/abs/2302.07444v2
- Date: Mon, 20 Mar 2023 20:36:48 GMT
- Title: A Case Study on Designing Evaluations of ML Explanations with Simulated
  User Studies
- Authors: Ada Martin, Valerie Chen, S\'ergio Jesus, Pedro Saleiro
- Abstract summary: We conduct the first SimEvals on a real-world use case to evaluate whether explanations can better support ML-assisted decision-making in e-commerce fraud detection.
We find that SimEvals suggest that all considered explainers are equally performant, and none beat a baseline without explanations.
- Score: 6.2511886555343805
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   When conducting user studies to ascertain the usefulness of model
explanations in aiding human decision-making, it is important to use real-world
use cases, data, and users. However, this process can be resource-intensive,
allowing only a limited number of explanation methods to be evaluated.
Simulated user evaluations (SimEvals), which use machine learning models as a
proxy for human users, have been proposed as an intermediate step to select
promising explanation methods. In this work, we conduct the first SimEvals on a
real-world use case to evaluate whether explanations can better support
ML-assisted decision-making in e-commerce fraud detection. We study whether
SimEvals can corroborate findings from a user study conducted in this fraud
detection context. In particular, we find that SimEvals suggest that all
considered explainers are equally performant, and none beat a baseline without
explanations -- this matches the conclusions of the original user study. Such
correspondences between our results and the original user study provide initial
evidence in favor of using SimEvals before running user studies. We also
explore the use of SimEvals as a cheap proxy to explore an alternative user
study set-up. We hope that this work motivates further study of when and how
SimEvals should be used to aid in the design of real-world evaluations.
 
      
        Related papers
        - Evaluating Contrastive Feedback for Effective User Simulations [2.8089969618577997]
 This study explores whether the underlying principles of contrastive training techniques can be applied beneficially in the area of prompt engineering for user simulations.<n>The primary objective of this study is to analyze how different modalities of contextual information influence the effectiveness of user simulations.
 arXiv  Detail & Related papers  (2025-05-05T11:02:31Z)
- Minimizing Live Experiments in Recommender Systems: User Simulation to   Evaluate Preference Elicitation Policies [14.258382779305483]
 We describe a simulation methodology used to augment (and reduce) the use of live experiments.
We illustrate its deployment for the evaluation of preference elicitation'' algorithms used to onboard new users of the YouTube Music platform.
 arXiv  Detail & Related papers  (2024-09-26T00:08:46Z)
- Towards a Formal Characterization of User Simulation Objectives in   Conversational Information Access [15.54070473873364]
 User simulation is a promising approach for automatically training and evaluating conversational information access agents.
We define the distinct objectives for user simulators: training aims to maximize behavioral similarity to real users, while evaluation focuses on the accurate prediction of real-world conversational agent performance.
 arXiv  Detail & Related papers  (2024-06-27T08:46:41Z)
- A Sim2Real Approach for Identifying Task-Relevant Properties in   Interpretable Machine Learning [18.965568482077344]
 We introduce a generalizable, cost-effective method for identifying task-relevant explanation properties in silico.
We use our approach to identify relevant proxies for three example tasks and validate our simulation with real user studies.
 arXiv  Detail & Related papers  (2024-05-31T18:08:35Z)
- BASES: Large-scale Web Search User Simulation with Large Language Model
  based Agents [108.97507653131917]
 BASES is a novel user simulation framework with large language models (LLMs)
Our simulation framework can generate unique user profiles at scale, which subsequently leads to diverse search behaviors.
 WARRIORS is a new large-scale dataset encompassing web search user behaviors, including both Chinese and English versions.
 arXiv  Detail & Related papers  (2024-02-27T13:44:09Z)
- Evaluating the Utility of Model Explanations for Model Development [54.23538543168767]
 We evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development.
To our surprise, we did not find evidence of significant improvement on tasks when users were provided with any of the saliency maps.
These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.
 arXiv  Detail & Related papers  (2023-12-10T23:13:23Z)
- User Behavior Simulation with Large Language Model based Agents [116.74368915420065]
 We propose an LLM-based agent framework and design a sandbox environment to simulate real user behaviors.
Based on extensive experiments, we find that the simulated behaviors of our method are very close to the ones of real humans.
 arXiv  Detail & Related papers  (2023-06-05T02:58:35Z)
- On the Importance of Application-Grounded Experimental Design for
  Evaluating Explainable ML Methods [20.2027063607352]
 We present an experimental study extending a prior explainable ML evaluation experiment and bringing the setup closer to the deployment setting.
Our empirical study draws dramatically different conclusions than the prior work, highlighting how seemingly trivial experimental design choices can yield misleading results.
We believe this work holds lessons about the necessity of situating the evaluation of any ML method and choosing appropriate tasks, data, users, and metrics to match the intended deployment contexts.
 arXiv  Detail & Related papers  (2022-06-24T14:46:19Z)
- Synthetic Data-Based Simulators for Recommender Systems: A Survey [55.60116686945561]
 This survey aims at providing a comprehensive overview of the recent trends in the field of modeling and simulation.
We start with the motivation behind the development of frameworks implementing the simulations -- simulators.
We provide a new consistent classification of existing simulators based on their functionality, approbation, and industrial effectiveness.
 arXiv  Detail & Related papers  (2022-06-22T19:33:21Z)
- Use-Case-Grounded Simulations for Explanation Evaluation [23.584251632331046]
 We introduce Use-Case-Grounded Simulated Evaluations (SimEvals)
SimEvals involve training algorithmic agents that take as input the information content that would be presented to each participant in a human subject study.
We run a comprehensive evaluation on three real-world use cases to demonstrate that Simevals can effectively identify which explanation methods will help humans for each use case.
 arXiv  Detail & Related papers  (2022-06-05T20:12:19Z)
- Active Surrogate Estimators: An Active Learning Approach to
  Label-Efficient Model Evaluation [59.7305309038676]
 We propose Active Surrogate Estimators (ASEs) for model evaluation.
We find that ASEs offer greater label-efficiency than the current state-of-the-art.
 arXiv  Detail & Related papers  (2022-02-14T17:15:18Z)
- A User's Guide to Calibrating Robotics Simulators [54.85241102329546]
 This paper proposes a set of benchmarks and a framework for the study of various algorithms aimed to transfer models and policies learnt in simulation to the real world.
We conduct experiments on a wide range of well known simulated environments to characterize and offer insights into the performance of different algorithms.
Our analysis can be useful for practitioners working in this area and can help make informed choices about the behavior and main properties of sim-to-real algorithms.
 arXiv  Detail & Related papers  (2020-11-17T22:24:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.