Related papers: Causal interventions expose implicit situation models for commonsense language understanding

Causal interventions expose implicit situation models for commonsense language understanding

URL: http://arxiv.org/abs/2306.03882v2
Date: Wed, 7 Jun 2023 13:17:04 GMT
Title: Causal interventions expose implicit situation models for commonsense language understanding
Authors: Takateru Yamakoshi, James L. McClelland, Adele E. Goldberg, Robert D. Hawkins
Abstract summary: We analyze performance on the Winograd Challenge, where a single context cue shifts interpretation of an ambiguous pronoun. We identify a circuit of attention heads that are responsible for propagating information from the context word. These analyses suggest distinct pathways through which implicit situation models are constructed to guide pronoun resolution.
Score: 3.290878132806227
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accounts of human language processing have long appealed to implicit ``situation models'' that enrich comprehension with relevant but unstated world knowledge. Here, we apply causal intervention techniques to recent transformer models to analyze performance on the Winograd Schema Challenge (WSC), where a single context cue shifts interpretation of an ambiguous pronoun. We identify a relatively small circuit of attention heads that are responsible for propagating information from the context word that guides which of the candidate noun phrases the pronoun ultimately attends to. We then compare how this circuit behaves in a closely matched ``syntactic'' control where the situation model is not strictly necessary. These analyses suggest distinct pathways through which implicit situation models are constructed to guide pronoun resolution.

Related papers

Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning [9.795934690403374]
It is still unclear which multi-step reasoning mechanisms are used by language models to solve such tasks. We employ circuit analysis and self-influence functions to evaluate the changing importance of each token throughout the reasoning process. We demonstrate that the underlying circuits reveal a human-interpretable reasoning process used by the model.
arXiv Detail & Related papers (2025-02-13T07:19:05Z)
On the Loss of Context-awareness in General Instruction Fine-tuning [101.03941308894191]
We investigate the loss of context awareness after supervised fine-tuning. We find that the performance decline is associated with a bias toward different roles learned during conversational instruction fine-tuning. We propose a metric to identify context-dependent examples from general instruction fine-tuning datasets.
arXiv Detail & Related papers (2024-11-05T00:16:01Z)
Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers. We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models. Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z)
Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model [19.826983068662106]
We propose to study autoregressive Transformer models on a synthetic task that embodies the multi-step nature of problems where stepwise inference is generally most useful. Despite is simplicity, we find we can empirically reproduce and analyze several phenomena observed at scale.
arXiv Detail & Related papers (2024-02-12T16:25:47Z)
Contrastive Learning for Inference in Dialogue [56.20733835058695]
Inference, especially those derived from inductive processes, is a crucial component in our conversation. Recent large language models show remarkable advances in inference tasks. But their performance in inductive reasoning, where not all information is present in the context, is far behind deductive reasoning.
arXiv Detail & Related papers (2023-10-19T04:49:36Z)
Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z)
Counterfactuals of Counterfactuals: a back-translation-inspired approach to analyse counterfactual editors [3.4253416336476246]
We focus on the analysis of counterfactual, contrastive explanations. We propose a new back translation-inspired evaluation methodology. We show that by iteratively feeding the counterfactual to the explainer we can obtain valuable insights into the behaviour of both the predictor and the explainer models.
arXiv Detail & Related papers (2023-05-26T16:04:28Z)
Explaining How Transformers Use Context to Build Predictions [0.1749935196721634]
Language Generation Models produce words based on the previous context. It is still unclear how prior words affect the model's decision throughout the layers. We leverage recent advances in explainability of the Transformer and present a procedure to analyze models for language generation.
arXiv Detail & Related papers (2023-05-21T18:29:10Z)
Knowledge-Based Counterfactual Queries for Visual Question Answering [0.0]
We propose a systematic method for explaining the behavior and investigating the robustness of VQA models through counterfactual perturbations. For this reason, we exploit structured knowledge bases to perform deterministic, optimal and controllable word-level replacements targeting the linguistic modality. We then evaluate the model's response against such counterfactual inputs.
arXiv Detail & Related papers (2023-03-05T08:00:30Z)
Probing via Prompting [71.7904179689271]
This paper introduces a novel model-free approach to probing, by formulating probing as a prompting task. We conduct experiments on five probing tasks and show that our approach is comparable or better at extracting information than diagnostic probes. We then examine the usefulness of a specific linguistic property for pre-training by removing the heads that are essential to that property and evaluating the resulting model's performance on language modeling.
arXiv Detail & Related papers (2022-07-04T22:14:40Z)
Can Unsupervised Knowledge Transfer from Social Discussions Help Argument Mining? [25.43442712037725]
We propose a novel transfer learning strategy to overcome the challenges of unsupervised, argumentative discourse-aware knowledge. We utilize argumentation-rich social discussions from the ChangeMyView subreddit as a source of unsupervised, argumentative discourse-aware knowledge. We introduce a novel prompt-based strategy for inter-component relation prediction that compliments our proposed finetuning method.
arXiv Detail & Related papers (2022-03-24T06:48:56Z)
Discrete Reasoning Templates for Natural Language Understanding [79.07883990966077]
We present an approach that reasons about complex questions by decomposing them to simpler subquestions. We derive the final answer according to instructions in a predefined reasoning template. We show that our approach is competitive with the state-of-the-art while being interpretable and requires little supervision.
arXiv Detail & Related papers (2021-04-05T18:56:56Z)
A Brief Survey and Comparative Study of Recent Development of Pronoun Coreference Resolution [55.39835612617972]
Pronoun Coreference Resolution (PCR) is the task of resolving pronominal expressions to all mentions they refer to. As one important natural language understanding (NLU) component, pronoun resolution is crucial for many downstream tasks and still challenging for existing models. We conduct extensive experiments to show that even though current models are achieving good performance on the standard evaluation set, they are still not ready to be used in real applications.
arXiv Detail & Related papers (2020-09-27T01:40:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.