Related papers: XMD: An End-to-End Framework for Interactive Explanation-Based Debugging of NLP Models

XMD: An End-to-End Framework for Interactive Explanation-Based Debugging of NLP Models

URL: http://arxiv.org/abs/2210.16978v1
Date: Sun, 30 Oct 2022 23:09:09 GMT
Title: XMD: An End-to-End Framework for Interactive Explanation-Based Debugging of NLP Models
Authors: Dong-Ho Lee, Akshen Kadakia, Brihi Joshi, Aaron Chan, Ziyi Liu, Kiran Narahari, Takashi Shibuya, Ryosuke Mitani, Toshiyuki Sekiya, Jay Pujara, Xiang Ren
Abstract summary: Explanation-based model debug aims to resolve spurious biases by showing human users explanations of model behavior. We propose XMD: the first open-source, end-to-end framework for explanation-based model debug. XMD automatically updates the model in real time, by regularizing the model so that its explanations align with the user feedback.
Score: 33.81019305179569
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: NLP models are susceptible to learning spurious biases (i.e., bugs) that work on some datasets but do not properly reflect the underlying task. Explanation-based model debugging aims to resolve spurious biases by showing human users explanations of model behavior, asking users to give feedback on the behavior, then using the feedback to update the model. While existing model debugging methods have shown promise, their prototype-level implementations provide limited practical utility. Thus, we propose XMD: the first open-source, end-to-end framework for explanation-based model debugging. Given task- or instance-level explanations, users can flexibly provide various forms of feedback via an intuitive, web-based UI. After receiving user feedback, XMD automatically updates the model in real time, by regularizing the model so that its explanations align with the user feedback. The new model can then be easily deployed into real-world applications via Hugging Face. Using XMD, we can improve the model's OOD performance on text classification tasks by up to 18%.

Related papers

Predicting the Performance of Black-box LLMs through Self-Queries [60.87193950962585]
Large language models (LLMs) are increasingly relied on in AI systems, predicting when they make mistakes is crucial. In this paper, we extract features of LLMs in a black-box manner by using follow-up prompts and taking the probabilities of different responses as representations. We demonstrate that training a linear model on these low-dimensional representations produces reliable predictors of model performance at the instance level.
arXiv Detail & Related papers (2025-01-02T22:26:54Z)
RewardBench: Evaluating Reward Models for Language Modeling [100.28366840977966]
We present RewardBench, a benchmark dataset and code-base for evaluation of reward models. The dataset is a collection of prompt-chosen-rejected trios spanning chat, reasoning, and safety. On the RewardBench leaderboard, we evaluate reward models trained with a variety of methods.
arXiv Detail & Related papers (2024-03-20T17:49:54Z)
Increasing Performance And Sample Efficiency With Model-agnostic Interactive Feature Attributions [3.0655581300025996]
We provide model-agnostic implementations for two popular explanation methods (Occlusion and Shapley values) to enforce entirely different attributions in the complex model. We show how our proposed approach can significantly improve the model's performance only by augmenting its training dataset based on corrected explanations.
arXiv Detail & Related papers (2023-06-28T15:23:28Z)
Earning Extra Performance from Restrictive Feedbacks [41.05874087063763]
We set up a challenge named emphEarning eXtra PerformancE from restriCTive feEDdbacks (EXPECTED) to describe this form of model tuning problems. The goal of the model provider is to eventually deliver a satisfactory model to the local user(s) by utilizing the feedbacks. We propose to characterize the geometry of the model performance with regard to model parameters through exploring the parameters' distribution.
arXiv Detail & Related papers (2023-04-28T13:16:54Z)
IFAN: An Explainability-Focused Interaction Framework for Humans and NLP Models [13.158002463564895]
Interpretability and human oversight are fundamental pillars of deploying complex NLP models into real-world applications. We propose IFAN, a framework for real-time explanation-based interaction with NLP models.
arXiv Detail & Related papers (2023-03-06T13:37:59Z)
Predictable MDP Abstraction for Unsupervised Model-Based RL [93.91375268580806]
We propose predictable MDP abstraction (PMA) Instead of training a predictive model on the original MDP, we train a model on a transformed MDP with a learned action space. We theoretically analyze PMA and empirically demonstrate that PMA leads to significant improvements over prior unsupervised model-based RL approaches.
arXiv Detail & Related papers (2023-02-08T07:37:51Z)
Discover, Explanation, Improvement: An Automatic Slice Detection Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints. This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks. Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv Detail & Related papers (2022-11-08T19:00:00Z)
A Graph-Enhanced Click Model for Web Search [67.27218481132185]
We propose a novel graph-enhanced click model (GraphCM) for web search. We exploit both intra-session and inter-session information for the sparsity and cold-start problems.
arXiv Detail & Related papers (2022-06-17T08:32:43Z)
Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews. We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z)
Improving scripts with a memory of natural feedback [38.81097942561449]
We create a dynamic memory architecture with a growing memory of feedbacks about errors in the output. On a script generation task, we show empirically that the model learns to apply feedback effectively. This is a first step towards strengthening deployed models, potentially broadening their utility.
arXiv Detail & Related papers (2021-12-16T07:01:28Z)
What do we expect from Multiple-choice QA Systems? [70.86513724662302]
We consider a top performing model on several Multiple Choice Question Answering (MCQA) datasets. We evaluate it against a set of expectations one might have from such a model, using a series of zero-information perturbations of the model's inputs.
arXiv Detail & Related papers (2020-11-20T21:27:10Z)
ViCE: Visual Counterfactual Explanations for Machine Learning Models [13.94542147252982]
We present an interactive visual analytics tool, ViCE, that generates counterfactual explanations to contextualize and evaluate model decisions. Results are effectively displayed in a visual interface where counterfactual explanations are highlighted and interactive methods are provided for users to explore the data and model.
arXiv Detail & Related papers (2020-03-05T04:43:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.