IFAN: An Explainability-Focused Interaction Framework for Humans and NLP
Models
- URL: http://arxiv.org/abs/2303.03124v2
- Date: Mon, 2 Oct 2023 15:15:33 GMT
- Title: IFAN: An Explainability-Focused Interaction Framework for Humans and NLP
Models
- Authors: Edoardo Mosca, Daryna Dementieva, Tohid Ebrahim Ajdari, Maximilian
Kummeth, Kirill Gringauz, Yutong Zhou and Georg Groh
- Abstract summary: Interpretability and human oversight are fundamental pillars of deploying complex NLP models into real-world applications.
We propose IFAN, a framework for real-time explanation-based interaction with NLP models.
- Score: 13.158002463564895
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Interpretability and human oversight are fundamental pillars of deploying
complex NLP models into real-world applications. However, applying
explainability and human-in-the-loop methods requires technical proficiency.
Despite existing toolkits for model understanding and analysis, options to
integrate human feedback are still limited. We propose IFAN, a framework for
real-time explanation-based interaction with NLP models. Through IFAN's
interface, users can provide feedback to selected model explanations, which is
then integrated through adapter layers to align the model with human rationale.
We show the system to be effective in debiasing a hate speech classifier with
minimal impact on performance. IFAN also offers a visual admin system and API
to manage models (and datasets) as well as control access rights. A demo is
live at https://ifan.ml.
Related papers
- Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment [65.15914284008973]
State-of-the-art techniques such as Reinforcement Learning from Human Feedback (RLHF) often consist of two stages.
1) supervised fine-tuning (SFT), where the model is fine-tuned by learning from human demonstration data.
2) Preference learning, where preference data is used to learn a reward model, which is in turn used by a reinforcement learning step to fine-tune the model.
arXiv Detail & Related papers (2024-05-28T07:11:05Z) - An Interpretable Ensemble of Graph and Language Models for Improving
Search Relevance in E-Commerce [22.449320058423886]
We propose Plug and Play Graph LAnguage Model (PP-GLAM), an explainable ensemble of plug and play models.
Our approach uses a modular framework with uniform data processing pipelines.
We show that PP-GLAM outperforms several state-of-the-art baselines and a proprietary model on real-world multilingual, multi-regional e-commerce datasets.
arXiv Detail & Related papers (2024-03-01T19:08:25Z) - Model-agnostic Body Part Relevance Assessment for Pedestrian Detection [4.405053430046726]
We present a framework for using sampling-based explanation models in a computer vision context by body part relevance assessment for pedestrian detection.
We introduce a novel sampling-based method similar to KernelSHAP that shows more robustness for lower sampling sizes and, thus, is more efficient for explainability analyses on large-scale datasets.
arXiv Detail & Related papers (2023-11-27T10:10:25Z) - SALMON: Self-Alignment with Instructable Reward Models [80.83323636730341]
This paper presents a novel approach, namely SALMON, to align base language models with minimal human supervision.
We develop an AI assistant named Dromedary-2 with only 6 exemplars for in-context learning and 31 human-defined principles.
arXiv Detail & Related papers (2023-10-09T17:56:53Z) - InterroLang: Exploring NLP Models and Datasets through Dialogue-based
Explanations [8.833264791078825]
We adapt the conversational explanation framework TalkToModel to the NLP domain, add new NLP-specific operations such as free-text rationalization.
To recognize user queries for explanations, we evaluate fine-tuned and few-shot prompting models.
We conduct two user studies on (1) the perceived correctness and helpfulness of the dialogues, and (2) the simulatability.
arXiv Detail & Related papers (2023-10-09T10:27:26Z) - FIND: A Function Description Benchmark for Evaluating Interpretability
Methods [86.80718559904854]
This paper introduces FIND (Function INterpretation and Description), a benchmark suite for evaluating automated interpretability methods.
FIND contains functions that resemble components of trained neural networks, and accompanying descriptions of the kind we seek to generate.
We evaluate methods that use pretrained language models to produce descriptions of function behavior in natural language and code.
arXiv Detail & Related papers (2023-09-07T17:47:26Z) - Studying How to Efficiently and Effectively Guide Models with Explanations [52.498055901649025]
'Model guidance' is the idea of regularizing the models' explanations to ensure that they are "right for the right reasons"
We conduct an in-depth evaluation across various loss functions, attribution methods, models, and 'guidance depths' on the PASCAL VOC 2007 and MS COCO 2014 datasets.
Specifically, we guide the models via bounding box annotations, which are much cheaper to obtain than the commonly used segmentation masks.
arXiv Detail & Related papers (2023-03-21T15:34:50Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - XMD: An End-to-End Framework for Interactive Explanation-Based Debugging
of NLP Models [33.81019305179569]
Explanation-based model debug aims to resolve spurious biases by showing human users explanations of model behavior.
We propose XMD: the first open-source, end-to-end framework for explanation-based model debug.
XMD automatically updates the model in real time, by regularizing the model so that its explanations align with the user feedback.
arXiv Detail & Related papers (2022-10-30T23:09:09Z) - Switchable Representation Learning Framework with Self-compatibility [50.48336074436792]
We propose a Switchable representation learning Framework with Self-Compatibility (SFSC)
SFSC generates a series of compatible sub-models with different capacities through one training process.
SFSC achieves state-of-the-art performance on the evaluated datasets.
arXiv Detail & Related papers (2022-06-16T16:46:32Z) - Interactively Generating Explanations for Transformer Language Models [14.306470205426526]
Transformer language models are state-of-the-art in a multitude of NLP tasks.
Recent methods aim to provide interpretability and explainability to black-box models.
We emphasize using prototype networks directly incorporated into the model architecture.
arXiv Detail & Related papers (2021-09-02T11:34:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.