Related papers: On the Importance of Application-Grounded Experimental Design for Evaluating Explainable ML Methods

On the Importance of Application-Grounded Experimental Design for Evaluating Explainable ML Methods

URL: http://arxiv.org/abs/2206.13503v2
Date: Tue, 28 Jun 2022 13:40:29 GMT
Title: On the Importance of Application-Grounded Experimental Design for Evaluating Explainable ML Methods
Authors: Kasun Amarasinghe, Kit T. Rodolfa, S\'ergio Jesus, Valerie Chen, Vladimir Balayan, Pedro Saleiro, Pedro Bizarro, Ameet Talwalkar, Rayid Ghani
Abstract summary: We present an experimental study extending a prior explainable ML evaluation experiment and bringing the setup closer to the deployment setting. Our empirical study draws dramatically different conclusions than the prior work, highlighting how seemingly trivial experimental design choices can yield misleading results. We believe this work holds lessons about the necessity of situating the evaluation of any ML method and choosing appropriate tasks, data, users, and metrics to match the intended deployment contexts.
Score: 20.2027063607352
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine Learning (ML) models now inform a wide range of human decisions, but using ``black box'' models carries risks such as relying on spurious correlations or errant data. To address this, researchers have proposed methods for supplementing models with explanations of their predictions. However, robust evaluations of these methods' usefulness in real-world contexts have remained elusive, with experiments tending to rely on simplified settings or proxy tasks. We present an experimental study extending a prior explainable ML evaluation experiment and bringing the setup closer to the deployment setting by relaxing its simplifying assumptions. Our empirical study draws dramatically different conclusions than the prior work, highlighting how seemingly trivial experimental design choices can yield misleading results. Beyond the present experiment, we believe this work holds lessons about the necessity of situating the evaluation of any ML method and choosing appropriate tasks, data, users, and metrics to match the intended deployment contexts.

Related papers

Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective. The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning. The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z)
Causal Lifting of Neural Representations: Zero-Shot Generalization for Causal Inferences [56.23412698865433]
We focus on causal inferences on a target experiment with unlabeled factual outcomes, retrieved by a predictive model fine-tuned on a labeled similar experiment. First, we show that factual outcome estimation via Empirical Risk Minimization (ERM) may fail to yield valid causal inferences on the target population. We propose Deconfounded Empirical Risk Minimization (DERM), a new simple learning procedure minimizing the risk over a fictitious target population.
arXiv Detail & Related papers (2025-02-10T10:52:17Z)
Using LLMs for Explaining Sets of Counterfactual Examples to Final Users [0.0]
In automated decision-making scenarios, causal inference methods can analyze the underlying data-generation process. Counterfactual examples explore hypothetical scenarios where a minimal number of factors are altered. We propose a novel multi-step pipeline that uses counterfactuals to generate natural language explanations of actions that will lead to a change in outcome.
arXiv Detail & Related papers (2024-08-27T15:13:06Z)
Simulating Field Experiments with Large Language Models [0.6144680854063939]
This paper pioneers the utilization of large language models (LLMs) for simulating field experiments. By introducing two novel prompting strategies, observer and participant modes, we demonstrate the ability of LLMs to both predict outcomes and replicate participant responses within complex field settings. Our findings indicate a promising alignment with actual experimental results in certain scenarios, achieving a stimulation accuracy of 66% in observer mode.
arXiv Detail & Related papers (2024-08-19T03:41:43Z)
C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations. Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z)
Querying Easily Flip-flopped Samples for Deep Active Learning [63.62397322172216]
Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data. One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is. This paper proposes the it least disagree metric (LDM) as the smallest probability of disagreement of the predicted label.
arXiv Detail & Related papers (2024-01-18T08:12:23Z)
Adaptive Instrument Design for Indirect Experiments [48.815194906471405]
Unlike RCTs, indirect experiments estimate treatment effects by leveragingconditional instrumental variables. In this paper we take the initial steps towards enhancing sample efficiency for indirect experiments by adaptively designing a data collection policy. Our main contribution is a practical computational procedure that utilizes influence functions to search for an optimal data collection policy.
arXiv Detail & Related papers (2023-12-05T02:38:04Z)
A Double Machine Learning Approach to Combining Experimental and Observational Data [59.29868677652324]
We propose a double machine learning approach to combine experimental and observational studies. Our framework tests for violations of external validity and ignorability under milder assumptions.
arXiv Detail & Related papers (2023-07-04T02:53:11Z)
Intervention Generalization: A View from Factor Graph Models [7.117681268784223]
We take a close look at how to warrant a leap from past experiments to novel conditions based on minimal assumptions about the factorization of the distribution of the manipulated system. A postulated $textitinterventional factor model$ (IFM) may not always be informative, but it conveniently abstracts away a need for explicitly modeling unmeasured confounding and feedback mechanisms.
arXiv Detail & Related papers (2023-06-06T21:44:23Z)
Leaving the Nest: Going Beyond Local Loss Functions for Predict-Then-Optimize [57.22851616806617]
We show that our method achieves state-of-the-art results in four domains from the literature. Our approach outperforms the best existing method by nearly 200% when the localness assumption is broken.
arXiv Detail & Related papers (2023-05-26T11:17:45Z)
Bayesian Optimal Experimental Design for Simulator Models of Cognition [14.059933880568908]
We combine recent advances in BOED and approximate inference for intractable models to find optimal experimental designs. Our simulation experiments on multi-armed bandit tasks show that our method results in improved model discrimination and parameter estimation.
arXiv Detail & Related papers (2021-10-29T09:04:01Z)
Predicting Performance for Natural Language Processing Tasks [128.34208911925424]
We build regression models to predict the evaluation score of an NLP experiment given the experimental settings as input. Experimenting on 9 different NLP tasks, we find that our predictors can produce meaningful predictions over unseen languages and different modeling architectures.
arXiv Detail & Related papers (2020-05-02T16:02:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.