CausalGym: Benchmarking causal interpretability methods on linguistic
tasks
- URL: http://arxiv.org/abs/2402.12560v1
- Date: Mon, 19 Feb 2024 21:35:56 GMT
- Title: CausalGym: Benchmarking causal interpretability methods on linguistic
tasks
- Authors: Aryaman Arora, Dan Jurafsky, Christopher Potts
- Abstract summary: We use CausalGym to benchmark the ability of interpretability methods to causally affect model behaviour.
We study the pythia models (14M--6.9B) and assess the causal efficacy of a wide range of interpretability methods.
We find that DAS outperforms the other methods, and so we use it to study the learning trajectory of two difficult linguistic phenomena.
- Score: 52.61917615039112
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language models (LMs) have proven to be powerful tools for psycholinguistic
research, but most prior work has focused on purely behavioural measures (e.g.,
surprisal comparisons). At the same time, research in model interpretability
has begun to illuminate the abstract causal mechanisms shaping LM behavior. To
help bring these strands of research closer together, we introduce CausalGym.
We adapt and expand the SyntaxGym suite of tasks to benchmark the ability of
interpretability methods to causally affect model behaviour. To illustrate how
CausalGym can be used, we study the pythia models (14M--6.9B) and assess the
causal efficacy of a wide range of interpretability methods, including linear
probing and distributed alignment search (DAS). We find that DAS outperforms
the other methods, and so we use it to study the learning trajectory of two
difficult linguistic phenomena in pythia-1b: negative polarity item licensing
and filler--gap dependencies. Our analysis shows that the mechanism
implementing both of these tasks is learned in discrete stages, not gradually.
Related papers
- Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors [74.04775677110179]
In-context Learning (ICL) has become the primary method for performing natural language tasks with Large Language Models (LLMs)
In this work, we examine whether this is the result of the aggregation used in corresponding datasets, where trying to combine low-agreement, disparate annotations might lead to annotation artifacts that create detrimental noise in the prompt.
Our results indicate that aggregation is a confounding factor in the modeling of subjective tasks, and advocate focusing on modeling individuals instead.
arXiv Detail & Related papers (2024-10-17T17:16:00Z) - Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data [3.376269351435396]
We develop a formal perspective on probing using structural causal models (SCM)
We extend a recent study of LMs in the context of a synthetic grid-world navigation task.
Our techniques provide robust empirical evidence for the ability of LMs to induce the latent concepts underlying text.
arXiv Detail & Related papers (2024-07-18T17:59:27Z) - Towards Understanding Sensitive and Decisive Patterns in Explainable AI: A Case Study of Model Interpretation in Geometric Deep Learning [18.408342615833185]
This research focuses on distinguishing between two critical data patterns -- sensitive patterns (model-related) and decisive patterns (task-related)
We compare the effectiveness of two main streams of interpretation methods: post-hoc methods and self-interpretable methods.
Our findings indicate that post-hoc methods tend to provide interpretations better aligned with sensitive patterns, whereas certain self-interpretable methods exhibit strong and stable performance in detecting decisive patterns.
arXiv Detail & Related papers (2024-06-30T22:59:15Z) - How Likely Do LLMs with CoT Mimic Human Reasoning? [31.86489714330338]
Chain-of-thought (CoT) emerges as a promising technique to elicit reasoning capabilities from Large Language Models (LLMs)
In this paper, we diagnose the underlying mechanism by comparing the reasoning process of LLMs with humans.
Our empirical study reveals that LLMs often deviate from a causal chain, resulting in spurious correlations and potential consistency errors.
arXiv Detail & Related papers (2024-02-25T10:13:04Z) - SLEM: Machine Learning for Path Modeling and Causal Inference with Super
Learner Equation Modeling [3.988614978933934]
Causal inference is a crucial goal of science, enabling researchers to arrive at meaningful conclusions using observational data.
Path models, Structural Equation Models (SEMs) and Directed Acyclic Graphs (DAGs) provide a means to unambiguously specify assumptions regarding the causal structure underlying a phenomenon.
We propose Super Learner Equation Modeling, a path modeling technique integrating machine learning Super Learner ensembles.
arXiv Detail & Related papers (2023-08-08T16:04:42Z) - Generative Models as a Complex Systems Science: How can we make sense of
large language model behavior? [75.79305790453654]
Coaxing out desired behavior from pretrained models, while avoiding undesirable ones, has redefined NLP.
We argue for a systematic effort to decompose language model behavior into categories that explain cross-task performance.
arXiv Detail & Related papers (2023-07-31T22:58:41Z) - Interpretability in the Wild: a Circuit for Indirect Object
Identification in GPT-2 small [68.879023473838]
We present an explanation for how GPT-2 small performs a natural language task called indirect object identification (IOI)
To our knowledge, this investigation is the largest end-to-end attempt at reverse-engineering a natural behavior "in the wild" in a language model.
arXiv Detail & Related papers (2022-11-01T17:08:44Z) - Systematic Evaluation of Causal Discovery in Visual Model Based
Reinforcement Learning [76.00395335702572]
A central goal for AI and causality is the joint discovery of abstract representations and causal structure.
Existing environments for studying causal induction are poorly suited for this objective because they have complicated task-specific causal graphs.
In this work, our goal is to facilitate research in learning representations of high-level variables as well as causal structures among them.
arXiv Detail & Related papers (2021-07-02T05:44:56Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.