ReAGent: A Model-agnostic Feature Attribution Method for Generative
Language Models
- URL: http://arxiv.org/abs/2402.00794v2
- Date: Wed, 7 Feb 2024 21:01:16 GMT
- Title: ReAGent: A Model-agnostic Feature Attribution Method for Generative
Language Models
- Authors: Zhixue Zhao, Boxuan Shan
- Abstract summary: Feature attribution methods (FAs) are employed to derive the importance of all input features to the model predictions.
It is unknown if it is faithful to use these FAs for decoder-only models on text generation.
We present a model-agnostic FA for generative LMs called Recursive Attribution Generator (ReAGent)
- Score: 4.015810081063028
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Feature attribution methods (FAs), such as gradients and attention, are
widely employed approaches to derive the importance of all input features to
the model predictions. Existing work in natural language processing has mostly
focused on developing and testing FAs for encoder-only language models (LMs) in
classification tasks. However, it is unknown if it is faithful to use these FAs
for decoder-only models on text generation, due to the inherent differences
between model architectures and task settings respectively. Moreover, previous
work has demonstrated that there is no `one-wins-all' FA across models and
tasks. This makes the selection of a FA computationally expensive for large LMs
since input importance derivation often requires multiple forward and backward
passes including gradient computations that might be prohibitive even with
access to large compute. To address these issues, we present a model-agnostic
FA for generative LMs called Recursive Attribution Generator (ReAGent). Our
method updates the token importance distribution in a recursive manner. For
each update, we compute the difference in the probability distribution over the
vocabulary for predicting the next token between using the original input and
using a modified version where a part of the input is replaced with RoBERTa
predictions. Our intuition is that replacing an important token in the context
should have resulted in a larger change in the model's confidence in predicting
the token than replacing an unimportant token. Our method can be universally
applied to any generative LM without accessing internal model weights or
additional training and fine-tuning, as most other FAs require. We extensively
compare the faithfulness of ReAGent with seven popular FAs across six
decoder-only LMs of various sizes. The results show that our method
consistently provides more faithful token importance distributions.
Related papers
- Dior-CVAE: Pre-trained Language Models and Diffusion Priors for
Variational Dialog Generation [70.2283756542824]
Dior-CVAE is a hierarchical conditional variational autoencoder (CVAE) with diffusion priors to address these challenges.
We employ a diffusion model to increase the complexity of the prior distribution and its compatibility with the distributions produced by a PLM.
Experiments across two commonly used open-domain dialog datasets show that our method can generate more diverse responses without large-scale dialog pre-training.
arXiv Detail & Related papers (2023-05-24T11:06:52Z) - Interpretability at Scale: Identifying Causal Mechanisms in Alpaca [62.65877150123775]
We use Boundless DAS to efficiently search for interpretable causal structure in large language models while they follow instructions.
Our findings mark a first step toward faithfully understanding the inner-workings of our ever-growing and most widely deployed language models.
arXiv Detail & Related papers (2023-05-15T17:15:40Z) - Masked Autoencoding for Scalable and Generalizable Decision Making [93.84855114717062]
MaskDP is a simple and scalable self-supervised pretraining method for reinforcement learning and behavioral cloning.
We find that a MaskDP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching.
arXiv Detail & Related papers (2022-11-23T07:04:41Z) - Deep Sequence Models for Text Classification Tasks [0.007329200485567826]
Natural Language Processing (NLP) is equipping machines to understand human diverse and complicated languages.
Common text classification application includes information retrieval, modeling news topic, theme extraction, sentiment analysis, and spam detection.
Sequence models such as RNN, GRU, and LSTM is a breakthrough for tasks with long-range dependencies.
Results generated were excellent with most of the models performing within the range of 80% and 94%.
arXiv Detail & Related papers (2022-07-18T18:47:18Z) - Entropy optimized semi-supervised decomposed vector-quantized
variational autoencoder model based on transfer learning for multiclass text
classification and generation [3.9318191265352196]
We propose a semisupervised discrete latent variable model for multi-class text classification and text generation.
The proposed model employs the concept of transfer learning for training a quantized transformer model.
Experimental results indicate that the proposed model has surpassed the state-of-the-art models remarkably.
arXiv Detail & Related papers (2021-11-10T07:07:54Z) - Generative Text Modeling through Short Run Inference [47.73892773331617]
The present work proposes a short run dynamics for inference. It is variation from the prior distribution of the latent variable and then runs a small number of Langevin dynamics steps guided by its posterior distribution.
We show that the models trained with short run dynamics more accurately model the data, compared to strong language model and VAE baselines, and exhibit no sign of posterior collapse.
arXiv Detail & Related papers (2021-05-27T09:14:35Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z) - AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering.
The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch.
The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level.
The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.