Causal Interpretation of Self-Attention in Pre-Trained Transformers
- URL: http://arxiv.org/abs/2310.20307v1
- Date: Tue, 31 Oct 2023 09:27:12 GMT
- Title: Causal Interpretation of Self-Attention in Pre-Trained Transformers
- Authors: Raanan Y. Rohekar, Yaniv Gurwicz, Shami Nisimov
- Abstract summary: We propose a causal interpretation of self-attention in the Transformer neural network architecture.
We use self-attention as a mechanism that estimates a structural equation model for a given input sequence of symbols.
We demonstrate this method by providing causal explanations for the outcomes of Transformers in two tasks: sentiment classification (NLP) and recommendation.
- Score: 4.419843514606336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a causal interpretation of self-attention in the Transformer
neural network architecture. We interpret self-attention as a mechanism that
estimates a structural equation model for a given input sequence of symbols
(tokens). The structural equation model can be interpreted, in turn, as a
causal structure over the input symbols under the specific context of the input
sequence. Importantly, this interpretation remains valid in the presence of
latent confounders. Following this interpretation, we estimate conditional
independence relations between input symbols by calculating partial
correlations between their corresponding representations in the deepest
attention layer. This enables learning the causal structure over an input
sequence using existing constraint-based algorithms. In this sense, existing
pre-trained Transformers can be utilized for zero-shot causal-discovery. We
demonstrate this method by providing causal explanations for the outcomes of
Transformers in two tasks: sentiment classification (NLP) and recommendation.
Related papers
- Comateformer: Combined Attention Transformer for Semantic Sentence Matching [11.746010399185437]
We propose a novel semantic sentence matching model named Combined Attention Network based on Transformer model (Comateformer)
In Comateformer model, we design a novel transformer-based quasi-attention mechanism with compositional properties.
Our proposed approach builds on the intuition of similarity and dissimilarity (negative affinity) when calculating dual affinity scores.
arXiv Detail & Related papers (2024-12-10T06:18:07Z) - Unsupervised Representation Learning from Sparse Transformation Analysis [79.94858534887801]
We propose to learn representations from sequence data by factorizing the transformations of the latent variables into sparse components.
Input data are first encoded as distributions of latent activations and subsequently transformed using a probability flow model.
arXiv Detail & Related papers (2024-10-07T23:53:25Z) - On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning [87.73401758641089]
Chain-of-thought (CoT) reasoning has improved the performance of modern language models (LMs)
We show that LMs can represent the same family of distributions over strings as probabilistic Turing machines.
arXiv Detail & Related papers (2024-06-20T10:59:02Z) - Semantic Loss Functions for Neuro-Symbolic Structured Prediction [74.18322585177832]
We discuss the semantic loss, which injects knowledge about such structure, defined symbolically, into training.
It is agnostic to the arrangement of the symbols, and depends only on the semantics expressed thereby.
It can be combined with both discriminative and generative neural models.
arXiv Detail & Related papers (2024-05-12T22:18:25Z) - Breaking Symmetry When Training Transformers [3.434553688053531]
We show that the prediction for output token $n+1$ of Transformer architectures without one of the mechanisms of positional encodings and causal attention is invariant to permutations of input tokens $1, 2,..., n-1$.
We elaborate on the argument that the causal connection mechanism must be responsible for the fact that Transformers are able to model input sequences where the order is important.
arXiv Detail & Related papers (2024-02-06T00:32:28Z) - Towards Interpretable Sequence Continuation: Analyzing Shared Circuits in Large Language Models [9.56229382432426]
This research aims to reverse engineer transformer models into human-readable representations that implement algorithmic functions.
By applying circuit interpretability analysis, we identify a key sub-circuit in both GPT-2 Small and Llama-2-7B.
We show that this sub-circuit has effects on various math-related prompts, such as on intervaled circuits, Spanish number word and months continuation, and natural language word problems.
arXiv Detail & Related papers (2023-11-07T16:58:51Z) - Characterizing Intrinsic Compositionality in Transformers with Tree
Projections [72.45375959893218]
neural models like transformers can route information arbitrarily between different parts of their input.
We show that transformers for three different tasks become more treelike over the course of training.
These trees are predictive of model behavior, with more tree-like models generalizing better on tests of compositional generalization.
arXiv Detail & Related papers (2022-11-02T17:10:07Z) - Do Transformers use variable binding? [14.222494511474103]
Increasing the explainability of deep neural networks (DNNs) requires evaluating whether they implement symbolic computation.
One central symbolic capacity is variable binding: linking an input value to an abstract variable held in system-internal memory.
We provide the first systematic evaluation of the variable binding capacities of the state-of-the-art Transformer networks BERT and RoBERTa.
arXiv Detail & Related papers (2022-02-19T09:56:38Z) - Inducing Transformer's Compositional Generalization Ability via
Auxiliary Sequence Prediction Tasks [86.10875837475783]
Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions.
Existing neural models have been shown to lack this basic ability in learning symbolic structures.
We propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics.
arXiv Detail & Related papers (2021-09-30T16:41:19Z) - Eigen Analysis of Self-Attention and its Reconstruction from Partial
Computation [58.80806716024701]
We study the global structure of attention scores computed using dot-product based self-attention.
We find that most of the variation among attention scores lie in a low-dimensional eigenspace.
We propose to compute scores only for a partial subset of token pairs, and use them to estimate scores for the remaining pairs.
arXiv Detail & Related papers (2021-06-16T14:38:42Z) - Masked Language Modeling for Proteins via Linearly Scalable Long-Context
Transformers [42.93754828584075]
We present a new Transformer architecture, Performer, based on Fast Attention Via Orthogonal Random features (FAVOR)
Our mechanism scales linearly rather than quadratically in the number of tokens in the sequence, is characterized by sub-quadratic space complexity and does not incorporate any sparsity pattern priors.
It provides strong theoretical guarantees: unbiased estimation of the attention matrix and uniform convergence.
arXiv Detail & Related papers (2020-06-05T17:09:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.