Bilinear MLPs enable weight-based mechanistic interpretability
- URL: http://arxiv.org/abs/2410.08417v1
- Date: Thu, 10 Oct 2024 23:22:11 GMT
- Title: Bilinear MLPs enable weight-based mechanistic interpretability
- Authors: Michael T. Pearce, Thomas Dooms, Alice Rigg, Jose M. Oramas, Lee Sharkey,
- Abstract summary: Bilinear layers serve as an interpretable drop-in replacement for current activation functions.
Weight-based interpretability is viable for understanding deep-learning models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A mechanistic understanding of how MLPs do computation in deep neural networks remains elusive. Current interpretability work can extract features from hidden activations over an input dataset but generally cannot explain how MLP weights construct features. One challenge is that element-wise nonlinearities introduce higher-order interactions and make it difficult to trace computations through the MLP layer. In this paper, we analyze bilinear MLPs, a type of Gated Linear Unit (GLU) without any element-wise nonlinearity that nevertheless achieves competitive performance. Bilinear MLPs can be fully expressed in terms of linear operations using a third-order tensor, allowing flexible analysis of the weights. Analyzing the spectra of bilinear MLP weights using eigendecomposition reveals interpretable low-rank structure across toy tasks, image classification, and language modeling. We use this understanding to craft adversarial examples, uncover overfitting, and identify small language model circuits directly from the weights alone. Our results demonstrate that bilinear layers serve as an interpretable drop-in replacement for current activation functions and that weight-based interpretability is viable for understanding deep-learning models.
Related papers
- Extrapolative ML Models for Copolymers [1.901715290314837]
Machine learning models have been progressively used for predicting materials properties.
These models are inherently interpolative, and their efficacy for searching candidates outside a material's known range of property is unresolved.
Here, we determine the relationship between the extrapolation ability of an ML model, the size and range of its training dataset, and its learning approach.
arXiv Detail & Related papers (2024-09-15T11:02:01Z) - Interpreting and Improving Large Language Models in Arithmetic Calculation [72.19753146621429]
Large language models (LLMs) have demonstrated remarkable potential across numerous applications.
In this work, we delve into uncovering a specific mechanism by which LLMs execute calculations.
We investigate the potential benefits of selectively fine-tuning these essential heads/MLPs to boost the LLMs' computational performance.
arXiv Detail & Related papers (2024-09-03T07:01:46Z) - Weight-based Decomposition: A Case for Bilinear MLPs [0.0]
Gated Linear Units (GLUs) have become a common building block in modern foundation models.
Bilinear layers drop the non-linearity in the "gate" but still have comparable performance to other GLUs.
We develop a method to decompose the bilinear tensor into a set of interacting eigenvectors.
arXiv Detail & Related papers (2024-06-06T10:46:51Z) - Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - MLPs Compass: What is learned when MLPs are combined with PLMs? [20.003022732050994]
Multilayer-Perceptrons (MLPs) modules achieving robust structural capture capabilities, even outperforming Graph Neural Networks (GNNs)
This paper aims to quantify whether simples can further enhance the already potent ability of PLMs to capture linguistic information.
arXiv Detail & Related papers (2024-01-03T11:06:01Z) - A technical note on bilinear layers for interpretability [0.0]
Bilinear layers are a type of layer that are mathematically much easier to analyze.
We can integrate this expression for bilinear layers into a mathematical framework for transformer circuits.
arXiv Detail & Related papers (2023-05-05T11:56:26Z) - Understanding the Role of Nonlinearity in Training Dynamics of
Contrastive Learning [37.27098255569438]
We study the role of nonlinearity in the training dynamics of contrastive learning (CL) on one and two-layer nonlinear networks.
We show that the presence of nonlinearity leads to many local optima even in 1-layer setting.
For 2-layer setting, we also discover emphglobal modulation: those local patterns discriminative from the perspective of global-level patterns are prioritized to learn.
arXiv Detail & Related papers (2022-06-02T23:52:35Z) - Efficient Language Modeling with Sparse all-MLP [53.81435968051093]
All-MLPs can match Transformers in language modeling, but still lag behind in downstream tasks.
We propose sparse all-MLPs with mixture-of-experts (MoEs) in both feature and input (tokens)
We evaluate its zero-shot in-context learning performance on six downstream tasks, and find that it surpasses Transformer-based MoEs and dense Transformers.
arXiv Detail & Related papers (2022-03-14T04:32:19Z) - Geometric and Physical Quantities improve E(3) Equivariant Message
Passing [59.98327062664975]
We introduce Steerable E(3) Equivariant Graph Neural Networks (SEGNNs) that generalise equivariant graph networks.
This model, composed of steerables, is able to incorporate geometric and physical information in both the message and update functions.
We demonstrate the effectiveness of our method on several tasks in computational physics and chemistry.
arXiv Detail & Related papers (2021-10-06T16:34:26Z) - How Neural Networks Extrapolate: From Feedforward to Graph Neural
Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution.
Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z) - Nonlinear ISA with Auxiliary Variables for Learning Speech
Representations [51.9516685516144]
We introduce a theoretical framework for nonlinear Independent Subspace Analysis (ISA) in the presence of auxiliary variables.
We propose an algorithm that learns unsupervised speech representations whose subspaces are independent.
arXiv Detail & Related papers (2020-07-25T14:53:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.