Related papers: Fact Recall, Heuristics or Pure Guesswork? Precise Interpretations of Language Models for Fact Completion

Fact Recall, Heuristics or Pure Guesswork? Precise Interpretations of Language Models for Fact Completion

URL: http://arxiv.org/abs/2410.14405v2
Date: Thu, 31 Oct 2024 08:44:13 GMT
Title: Fact Recall, Heuristics or Pure Guesswork? Precise Interpretations of Language Models for Fact Completion
Authors: Denitsa Saynova, Lovisa Hagström, Moa Johansson, Richard Johansson, Marco Kuhlmann,
Abstract summary: We study four different prediction scenarios for which the LM can be expected to show distinct behaviors. We propose a model-specific recipe called PrISM for constructing datasets with examples of each scenario. We find that while CT produces different results for each scenario, aggregations over a set of mixed examples may only represent the results from the scenario with the strongest measured signal.
Score: 9.383571944693188
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Previous interpretations of language models (LMs) miss important distinctions in how these models process factual information. For example, given the query "Astrid Lindgren was born in" with the corresponding completion "Sweden", no difference is made between whether the prediction was based on having the exact knowledge of the birthplace of the Swedish author or assuming that a person with a Swedish-sounding name was born in Sweden. In this paper, we investigate four different prediction scenarios for which the LM can be expected to show distinct behaviors. These scenarios correspond to different levels of model reliability and types of information being processed - some being less desirable for factual predictions. To facilitate precise interpretations of LMs for fact completion, we propose a model-specific recipe called PrISM for constructing datasets with examples of each scenario based on a set of diagnostic criteria. We apply a popular interpretability method, causal tracing (CT), to the four prediction scenarios and find that while CT produces different results for each scenario, aggregations over a set of mixed examples may only represent the results from the scenario with the strongest measured signal. In summary, we contribute tools for a more granular study of fact completion in language models and analyses that provide a more nuanced understanding of how LMs process fact-related queries.

Related papers

ExpliCa: Evaluating Explicit Causal Reasoning in Large Language Models [75.05436691700572]
We introduce ExpliCa, a new dataset for evaluating Large Language Models (LLMs) in explicit causal reasoning. We tested seven commercial and open-source LLMs on ExpliCa through prompting and perplexity-based metrics. Surprisingly, models tend to confound temporal relations with causal ones, and their performance is also strongly influenced by the linguistic order of the events.
arXiv Detail & Related papers (2025-02-21T14:23:14Z)
Unraveling Token Prediction Refinement and Identifying Essential Layers in Language Models [0.0]
This research aims to unravel how large language models (LLMs) iteratively refine token predictions through internal processing.<n>We focused on how LLMs access and utilize information from input contexts, and how positioning of relevant information affects the model's token prediction refinement process.
arXiv Detail & Related papers (2025-01-25T03:34:15Z)
Explanation sensitivity to the randomness of large language models: the case of journalistic text classification [6.240875403446504]
We study the effect of random elements in the training of large language models on the explainability of their predictions. Using a fine-tuned CamemBERT model and an explanation method based on relevance propagation, we find that training with different random seeds produces models with similar accuracy but variable explanations.
arXiv Detail & Related papers (2024-10-07T14:39:45Z)
Explaining word embeddings with perfect fidelity: Case study in research impact prediction [0.0]
Self-model Rated Entities (SMER) for logistic regression-based classification models trained on word embeddings. We show that SMER has theoretically perfect fidelity with the explained model, as its prediction corresponds exactly to the average of predictions for individual words in the text.
arXiv Detail & Related papers (2024-09-24T09:28:24Z)
Using LLMs for Explaining Sets of Counterfactual Examples to Final Users [0.0]
In automated decision-making scenarios, causal inference methods can analyze the underlying data-generation process. Counterfactual examples explore hypothetical scenarios where a minimal number of factors are altered. We propose a novel multi-step pipeline that uses counterfactuals to generate natural language explanations of actions that will lead to a change in outcome.
arXiv Detail & Related papers (2024-08-27T15:13:06Z)
PRobELM: Plausibility Ranking Evaluation for Language Models [12.057770969325453]
PRobELM is a benchmark designed to assess language models' ability to discern more plausible scenarios through their parametric knowledge. Our benchmark is constructed from a dataset curated from Wikidata edit histories, tailored to align the temporal bounds of the training data for the evaluated models.
arXiv Detail & Related papers (2024-04-04T21:57:11Z)
A Hypothesis-Driven Framework for the Analysis of Self-Rationalising Models [0.8702432681310401]
We use a Bayesian network to implement a hypothesis about how a task is solved. The resulting models do not exhibit a strong similarity to GPT-3.5. We discuss the implications of this as well as the framework's potential to approximate LLM decisions better in future work.
arXiv Detail & Related papers (2024-02-07T12:26:12Z)
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval [139.21955930418815]
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space. However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts. We propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.
arXiv Detail & Related papers (2023-09-29T09:41:19Z)
Conformal Language Modeling [61.94417935386489]
We propose a novel approach to conformal prediction for generative language models (LMs) Standard conformal prediction produces prediction sets with rigorous, statistical guarantees. We demonstrate the promise of our approach on multiple tasks in open-domain question answering, text summarization, and radiology report generation.
arXiv Detail & Related papers (2023-06-16T21:55:08Z)
Can Large Language Models Infer Causation from Correlation? [104.96351414570239]
We test the pure causal inference skills of large language models (LLMs) We formulate a novel task Corr2Cause, which takes a set of correlational statements and determines the causal relationship between the variables. We show that these models achieve almost close to random performance on the task.
arXiv Detail & Related papers (2023-06-09T12:09:15Z)
Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning. In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training. We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z)
Are Representations Built from the Ground Up? An Empirical Examination of Local Composition in Language Models [91.3755431537592]
Representing compositional and non-compositional phrases is critical for language understanding. We first formulate a problem of predicting the LM-internal representations of longer phrases given those of their constituents. While we would expect the predictive accuracy to correlate with human judgments of semantic compositionality, we find this is largely not the case.
arXiv Detail & Related papers (2022-10-07T14:21:30Z)
An Interpretability Evaluation Benchmark for Pre-trained Language Models [37.16893581395874]
We propose a novel evaluation benchmark providing with both English and Chinese annotated data. It tests LMs abilities in multiple dimensions, i.e., grammar, semantics, knowledge, reasoning and computation. It contains perturbed instances for each original instance, so as to use the rationale consistency under perturbations as the metric for faithfulness.
arXiv Detail & Related papers (2022-07-28T08:28:09Z)
A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes. We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z)
Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines. In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics. Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z)
Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task. The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them. By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z)
Explaining Question Answering Models through Text Generation [42.36596190720944]
Large pre-trained language models (LMs) have been shown to perform surprisingly well when fine-tuned on tasks that require commonsense and world knowledge. It is difficult to explain what is the knowledge in the LM that allows it to make a correct prediction in end-to-end architectures. We show on several tasks that our model reaches performance that is comparable to end-to-end architectures.
arXiv Detail & Related papers (2020-04-12T09:06:46Z)
Parameter Space Factorization for Zero-Shot Learning across Tasks and Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters. We infer the posteriors over such latent variables based on data from seen task-language combinations. Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.