Related papers: Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals

Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals

URL: http://arxiv.org/abs/2402.11655v2
Date: Thu, 6 Jun 2024 21:45:21 GMT
Title: Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals
Authors: Francesco Ortu, Zhijing Jin, Diego Doimo, Mrinmaya Sachan, Alberto Cazzaniga, Bernhard Schölkopf,
Abstract summary: Interpretability research aims to bridge the gap between empirical success and our scientific understanding of large language models (LLMs) We propose a formulation of competition of mechanisms, which focuses on the interplay of multiple mechanisms instead of individual mechanisms. Our findings show traces of the mechanisms and their competition across various model components and reveal attention positions that effectively control the strength of certain mechanisms.
Score: 82.68757839524677
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Interpretability research aims to bridge the gap between empirical success and our scientific understanding of the inner workings of large language models (LLMs). However, most existing research focuses on analyzing a single mechanism, such as how models copy or recall factual knowledge. In this work, we propose a formulation of competition of mechanisms, which focuses on the interplay of multiple mechanisms instead of individual mechanisms and traces how one of them becomes dominant in the final prediction. We uncover how and where mechanisms compete within LLMs using two interpretability methods: logit inspection and attention modification. Our findings show traces of the mechanisms and their competition across various model components and reveal attention positions that effectively control the strength of certain mechanisms. Code: https://github.com/francescortu/comp-mech. Data: https://huggingface.co/datasets/francescortu/comp-mech.

Related papers

Understanding Matching Mechanisms in Cross-Encoders [11.192264101562786]
Cross-encoders are highly effective models whose internal mechanisms are mostly unknown.<n>Most works trying to explain their behavior focus on high-level processes.<n>We demonstrate that more straightforward methods can already provide valuable insights.
arXiv Detail & Related papers (2025-07-19T13:05:27Z)
Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors [61.92704516732144]
We show that the most robust features for correctness prediction are those that play a distinctive causal role in the model's behavior.<n>We propose two methods that leverage causal mechanisms to predict the correctness of model outputs.
arXiv Detail & Related papers (2025-05-17T00:31:39Z)
Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering [19.472889262384818]
We develop an interpretability tool to help users and researchers identify important visual locations for final predictions. Our method demonstrates faster and more effective results compared to existing interpretability approaches.
arXiv Detail & Related papers (2024-11-17T03:32:50Z)
Reasoning Circuits in Language Models: A Mechanistic Interpretation of Syllogistic Inference [13.59675117792588]
Recent studies on language models (LMs) have sparked a debate on whether they can learn systematic inferential principles.<n>This paper presents a mechanistic interpretation of syllogistic inference.
arXiv Detail & Related papers (2024-08-16T07:47:39Z)
Compete and Compose: Learning Independent Mechanisms for Modular World Models [57.94106862271727]
We present COMET, a modular world model which leverages reusable, independent mechanisms across different environments. COMET is trained on multiple environments with varying dynamics via a two-step process: competition and composition. We show that COMET is able to adapt to new environments with varying numbers of objects with improved sample efficiency compared to more conventional finetuning approaches.
arXiv Detail & Related papers (2024-04-23T15:03:37Z)
On the Discussion of Large Language Models: Symmetry of Agents and Interplay with Prompts [51.3324922038486]
This paper reports the empirical results of the interplay of prompts and discussion mechanisms. It also proposes a scalable discussion mechanism based on conquer and merge.
arXiv Detail & Related papers (2023-11-13T04:56:48Z)
Properties from Mechanisms: An Equivariance Perspective on Identifiable Representation Learning [79.4957965474334]
Key goal of unsupervised representation learning is "inverting" a data generating process to recover its latent properties. This paper asks, "Can we instead identify latent properties by leveraging knowledge of the mechanisms that govern their evolution?" We provide a complete characterization of the sources of non-identifiability as we vary knowledge about a set of possible mechanisms.
arXiv Detail & Related papers (2021-10-29T14:04:08Z)
Multilingual Multi-Aspect Explainability Analyses on Machine Reading Comprehension Models [76.48370548802464]
This paper focuses on conducting a series of analytical experiments to examine the relations between the multi-head self-attention and the final MRC system performance. We discover that passage-to-question and passage understanding attentions are the most important ones in the question answering process. Through comprehensive visualizations and case studies, we also observe several general findings on the attention maps, which can be helpful to understand how these models solve the questions.
arXiv Detail & Related papers (2021-08-26T04:23:57Z)
Transformers with Competitive Ensembles of Independent Mechanisms [97.93090139318294]
We propose a new Transformer layer which divides the hidden representation and parameters into multiple mechanisms, which only exchange information through attention. We study TIM on a large-scale BERT model, on the Image Transformer, and on speech enhancement and find evidence for semantically meaningful specialization as well as improved performance.
arXiv Detail & Related papers (2021-02-27T21:48:46Z)
Empirically Classifying Network Mechanisms [0.0]
Network models are used to study interconnected systems across many physical, biological, and social disciplines. We introduce a simple empirical approach which can mechanistically classify arbitrary network data.
arXiv Detail & Related papers (2020-12-22T01:41:34Z)
Reinforcement Learning of Sequential Price Mechanisms [24.302600030585275]
We introduce the use of reinforcement learning for indirect mechanisms, working with the existing class of sequential price mechanisms. We show that our approach can learn optimal or near-optimal mechanisms in several experimental settings.
arXiv Detail & Related papers (2020-10-02T19:57:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.