Word Importance Explains How Prompts Affect Language Model Outputs
- URL: http://arxiv.org/abs/2403.03028v1
- Date: Tue, 5 Mar 2024 15:04:18 GMT
- Title: Word Importance Explains How Prompts Affect Language Model Outputs
- Authors: Stefan Hackmann, Haniyeh Mahmoudian, Mark Steadman and Michael Schmidt
- Abstract summary: This study presents a method to improve the explainability of large language models by varying individual words in prompts.
Unlike classical attention, word importance measures the impact of prompt words on arbitrarily-defined text scores.
Results show that word importance scores are closely related to the expected suffix importances for multiple scoring functions.
- Score: 0.7223681457195862
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The emergence of large language models (LLMs) has revolutionized numerous
applications across industries. However, their "black box" nature often hinders
the understanding of how they make specific decisions, raising concerns about
their transparency, reliability, and ethical use. This study presents a method
to improve the explainability of LLMs by varying individual words in prompts to
uncover their statistical impact on the model outputs. This approach, inspired
by permutation importance for tabular data, masks each word in the system
prompt and evaluates its effect on the outputs based on the available text
scores aggregated over multiple user inputs. Unlike classical attention, word
importance measures the impact of prompt words on arbitrarily-defined text
scores, which enables decomposing the importance of words into the specific
measures of interest--including bias, reading level, verbosity, etc. This
procedure also enables measuring impact when attention weights are not
available. To test the fidelity of this approach, we explore the effect of
adding different suffixes to multiple different system prompts and comparing
subsequent generations with different large language models. Results show that
word importance scores are closely related to the expected suffix importances
for multiple scoring functions.
Related papers
- Enhancing Argument Structure Extraction with Efficient Leverage of
Contextual Information [79.06082391992545]
We propose an Efficient Context-aware model (ECASE) that fully exploits contextual information.
We introduce a sequence-attention module and distance-weighted similarity loss to aggregate contextual information and argumentative information.
Our experiments on five datasets from various domains demonstrate that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-10-08T08:47:10Z) - Leverage Points in Modality Shifts: Comparing Language-only and
Multimodal Word Representations [0.8594140167290097]
Multimodal embeddings aim to enrich the semantic information in neural representations of language compared to text-only models.
Our paper compares word embeddings from three vision-and-language models and three text-only models, with static and contextual representations.
This is the first large-scale study of the effect of visual grounding on language representations, including 46 semantic parameters.
arXiv Detail & Related papers (2023-06-04T12:53:12Z) - Assessing Word Importance Using Models Trained for Semantic Tasks [0.0]
We derive word significance from models trained to solve semantic task: Natural Language Inference and Paraphrase Identification.
We evaluate their relevance using a so-called cross-task evaluation.
Our method can be used to identify important words in sentences without any explicit word importance labeling in training.
arXiv Detail & Related papers (2023-05-31T09:34:26Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - Neighboring Words Affect Human Interpretation of Saliency Explanations [65.29015910991261]
Word-level saliency explanations are often used to communicate feature-attribution in text-based models.
Recent studies found that superficial factors such as word length can distort human interpretation of the communicated saliency scores.
We investigate how the marking of a word's neighboring words affect the explainee's perception of the word's importance in the context of a saliency explanation.
arXiv Detail & Related papers (2023-05-04T09:50:25Z) - A Linguistic Investigation of Machine Learning based Contradiction
Detection Models: An Empirical Analysis and Future Perspectives [0.34998703934432673]
We analyze two Natural Language Inference data sets with respect to their linguistic features.
The goal is to identify those syntactic and semantic properties that are particularly hard to comprehend for a machine learning model.
arXiv Detail & Related papers (2022-10-19T10:06:03Z) - Exploiting Word Semantics to Enrich Character Representations of Chinese
Pre-trained Models [12.0190584907439]
We propose a new method to exploit word structure and integrate lexical semantics into character representations of pre-trained models.
We show that our approach achieves superior performance over the basic pre-trained models BERT, BERT-wwm and ERNIE on different Chinese NLP tasks.
arXiv Detail & Related papers (2022-07-13T02:28:08Z) - Unsupervised Multimodal Word Discovery based on Double Articulation
Analysis with Co-occurrence cues [7.332652485849632]
Human infants acquire their verbal lexicon with minimal prior knowledge of language.
This study proposes a novel fully unsupervised learning method for discovering speech units.
The proposed method can acquire words and phonemes from speech signals using unsupervised learning.
arXiv Detail & Related papers (2022-01-18T07:31:59Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z) - My Teacher Thinks The World Is Flat! Interpreting Automatic Essay
Scoring Mechanism [71.34160809068996]
Recent work shows that automated scoring systems are prone to even common-sense adversarial samples.
We utilize recent advances in interpretability to find the extent to which features such as coherence, content and relevance are important for automated scoring mechanisms.
We also find that since the models are not semantically grounded with world-knowledge and common sense, adding false facts such as the world is flat'' actually increases the score instead of decreasing it.
arXiv Detail & Related papers (2020-12-27T06:19:20Z) - Explaining Black Box Predictions and Unveiling Data Artifacts through
Influence Functions [55.660255727031725]
Influence functions explain the decisions of a model by identifying influential training examples.
We conduct a comparison between influence functions and common word-saliency methods on representative tasks.
We develop a new measure based on influence functions that can reveal artifacts in training data.
arXiv Detail & Related papers (2020-05-14T00:45:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.