XMD: An End-to-End Framework for Interactive Explanation-Based Debugging
of NLP Models
- URL: http://arxiv.org/abs/2210.16978v1
- Date: Sun, 30 Oct 2022 23:09:09 GMT
- Title: XMD: An End-to-End Framework for Interactive Explanation-Based Debugging
of NLP Models
- Authors: Dong-Ho Lee, Akshen Kadakia, Brihi Joshi, Aaron Chan, Ziyi Liu, Kiran
Narahari, Takashi Shibuya, Ryosuke Mitani, Toshiyuki Sekiya, Jay Pujara,
Xiang Ren
- Abstract summary: Explanation-based model debug aims to resolve spurious biases by showing human users explanations of model behavior.
We propose XMD: the first open-source, end-to-end framework for explanation-based model debug.
XMD automatically updates the model in real time, by regularizing the model so that its explanations align with the user feedback.
- Score: 33.81019305179569
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: NLP models are susceptible to learning spurious biases (i.e., bugs) that work
on some datasets but do not properly reflect the underlying task.
Explanation-based model debugging aims to resolve spurious biases by showing
human users explanations of model behavior, asking users to give feedback on
the behavior, then using the feedback to update the model. While existing model
debugging methods have shown promise, their prototype-level implementations
provide limited practical utility. Thus, we propose XMD: the first open-source,
end-to-end framework for explanation-based model debugging. Given task- or
instance-level explanations, users can flexibly provide various forms of
feedback via an intuitive, web-based UI. After receiving user feedback, XMD
automatically updates the model in real time, by regularizing the model so that
its explanations align with the user feedback. The new model can then be easily
deployed into real-world applications via Hugging Face. Using XMD, we can
improve the model's OOD performance on text classification tasks by up to 18%.
Related papers
- RewardBench: Evaluating Reward Models for Language Modeling [100.28366840977966]
We present RewardBench, a benchmark dataset and code-base for evaluation of reward models.
The dataset is a collection of prompt-chosen-rejected trios spanning chat, reasoning, and safety.
On the RewardBench leaderboard, we evaluate reward models trained with a variety of methods.
arXiv Detail & Related papers (2024-03-20T17:49:54Z) - Increasing Performance And Sample Efficiency With Model-agnostic
Interactive Feature Attributions [3.0655581300025996]
We provide model-agnostic implementations for two popular explanation methods (Occlusion and Shapley values) to enforce entirely different attributions in the complex model.
We show how our proposed approach can significantly improve the model's performance only by augmenting its training dataset based on corrected explanations.
arXiv Detail & Related papers (2023-06-28T15:23:28Z) - Earning Extra Performance from Restrictive Feedbacks [41.05874087063763]
We set up a challenge named emphEarning eXtra PerformancE from restriCTive feEDdbacks (EXPECTED) to describe this form of model tuning problems.
The goal of the model provider is to eventually deliver a satisfactory model to the local user(s) by utilizing the feedbacks.
We propose to characterize the geometry of the model performance with regard to model parameters through exploring the parameters' distribution.
arXiv Detail & Related papers (2023-04-28T13:16:54Z) - IFAN: An Explainability-Focused Interaction Framework for Humans and NLP
Models [13.158002463564895]
Interpretability and human oversight are fundamental pillars of deploying complex NLP models into real-world applications.
We propose IFAN, a framework for real-time explanation-based interaction with NLP models.
arXiv Detail & Related papers (2023-03-06T13:37:59Z) - Predictable MDP Abstraction for Unsupervised Model-Based RL [93.91375268580806]
We propose predictable MDP abstraction (PMA)
Instead of training a predictive model on the original MDP, we train a model on a transformed MDP with a learned action space.
We theoretically analyze PMA and empirically demonstrate that PMA leads to significant improvements over prior unsupervised model-based RL approaches.
arXiv Detail & Related papers (2023-02-08T07:37:51Z) - Discover, Explanation, Improvement: An Automatic Slice Detection
Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints.
This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks.
Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv Detail & Related papers (2022-11-08T19:00:00Z) - A Graph-Enhanced Click Model for Web Search [67.27218481132185]
We propose a novel graph-enhanced click model (GraphCM) for web search.
We exploit both intra-session and inter-session information for the sparsity and cold-start problems.
arXiv Detail & Related papers (2022-06-17T08:32:43Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Improving scripts with a memory of natural feedback [38.81097942561449]
We create a dynamic memory architecture with a growing memory of feedbacks about errors in the output.
On a script generation task, we show empirically that the model learns to apply feedback effectively.
This is a first step towards strengthening deployed models, potentially broadening their utility.
arXiv Detail & Related papers (2021-12-16T07:01:28Z) - What do we expect from Multiple-choice QA Systems? [70.86513724662302]
We consider a top performing model on several Multiple Choice Question Answering (MCQA) datasets.
We evaluate it against a set of expectations one might have from such a model, using a series of zero-information perturbations of the model's inputs.
arXiv Detail & Related papers (2020-11-20T21:27:10Z) - ViCE: Visual Counterfactual Explanations for Machine Learning Models [13.94542147252982]
We present an interactive visual analytics tool, ViCE, that generates counterfactual explanations to contextualize and evaluate model decisions.
Results are effectively displayed in a visual interface where counterfactual explanations are highlighted and interactive methods are provided for users to explore the data and model.
arXiv Detail & Related papers (2020-03-05T04:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.