A technical note on bilinear layers for interpretability
- URL: http://arxiv.org/abs/2305.03452v1
- Date: Fri, 5 May 2023 11:56:26 GMT
- Title: A technical note on bilinear layers for interpretability
- Authors: Lee Sharkey
- Abstract summary: Bilinear layers are a type of layer that are mathematically much easier to analyze.
We can integrate this expression for bilinear layers into a mathematical framework for transformer circuits.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability of neural networks to represent more features than neurons makes
interpreting them challenging. This phenomenon, known as superposition, has
spurred efforts to find architectures that are more interpretable than standard
multilayer perceptrons (MLPs) with elementwise activation functions. In this
note, I examine bilinear layers, which are a type of MLP layer that are
mathematically much easier to analyze while simultaneously performing better
than standard MLPs. Although they are nonlinear functions of their input, I
demonstrate that bilinear layers can be expressed using only linear operations
and third order tensors. We can integrate this expression for bilinear layers
into a mathematical framework for transformer circuits, which was previously
limited to attention-only transformers. These results suggest that bilinear
layers are easier to analyze mathematically than current architectures and thus
may lend themselves to deeper safety insights by allowing us to talk more
formally about circuits in neural networks. Additionally, bilinear layers may
offer an alternative path for mechanistic interpretability through
understanding the mechanisms of feature construction instead of enumerating a
(potentially exponentially) large number of features in large models.
Related papers
- Bilinear MLPs enable weight-based mechanistic interpretability [0.0]
Bilinear layers serve as an interpretable drop-in replacement for current activation functions.
Weight-based interpretability is viable for understanding deep-learning models.
arXiv Detail & Related papers (2024-10-10T23:22:11Z) - Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers [54.20763128054692]
We study how a two-attention-layer transformer is trained to perform ICL on $n$-gram Markov chain data.
We prove that the gradient flow with respect to a cross-entropy ICL loss converges to a limiting model.
arXiv Detail & Related papers (2024-09-09T18:10:26Z) - Weight-based Decomposition: A Case for Bilinear MLPs [0.0]
Gated Linear Units (GLUs) have become a common building block in modern foundation models.
Bilinear layers drop the non-linearity in the "gate" but still have comparable performance to other GLUs.
We develop a method to decompose the bilinear tensor into a set of interacting eigenvectors.
arXiv Detail & Related papers (2024-06-06T10:46:51Z) - How Do Transformers Learn In-Context Beyond Simple Functions? A Case
Study on Learning with Representations [98.7450564309923]
This paper takes initial steps on understanding in-context learning (ICL) in more complex scenarios, by studying learning with representations.
We construct synthetic in-context learning problems with a compositional structure, where the label depends on the input through a possibly complex but fixed representation function.
We show theoretically the existence of transformers that approximately implement such algorithms with mild depth and size.
arXiv Detail & Related papers (2023-10-16T17:40:49Z) - Centered Self-Attention Layers [89.21791761168032]
The self-attention mechanism in transformers and the message-passing mechanism in graph neural networks are repeatedly applied.
We show that this application inevitably leads to oversmoothing, i.e., to similar representations at the deeper layers.
We present a correction term to the aggregating operator of these mechanisms.
arXiv Detail & Related papers (2023-06-02T15:19:08Z) - ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions.
Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z) - Approximation analysis of CNNs from a feature extraction view [8.94250977764275]
We establish some analysis for linear feature extraction by a deep multi-channel convolutional neural networks (CNNs)
We give an exact construction presenting how linear features extraction can be conducted efficiently with multi-channel CNNs.
Rates of function approximation by such deep networks implemented with channels and followed by fully-connected layers are investigated as well.
arXiv Detail & Related papers (2022-10-14T04:09:01Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Connecting Weighted Automata, Tensor Networks and Recurrent Neural
Networks through Spectral Learning [58.14930566993063]
We present connections between three models used in different research fields: weighted finite automata(WFA) from formal languages and linguistics, recurrent neural networks used in machine learning, and tensor networks.
We introduce the first provable learning algorithm for linear 2-RNN defined over sequences of continuous vectors input.
arXiv Detail & Related papers (2020-10-19T15:28:00Z) - Dual-constrained Deep Semi-Supervised Coupled Factorization Network with
Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net.
To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network.
Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z) - Gradient-based Competitive Learning: Theory [1.6752712949948443]
This paper introduces a novel perspective in this area by combining gradient-based and competitive learning.
The theory is based on the intuition that neural networks are able to learn topological structures by working directly on the transpose of the input matrix.
The proposed approach has a great potential as it can be generalized to a vast selection of topological learning tasks.
arXiv Detail & Related papers (2020-09-06T19:00:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.