Related papers: Bilinear MLPs enable weight-based mechanistic interpretability

Bilinear MLPs enable weight-based mechanistic interpretability

URL: http://arxiv.org/abs/2410.08417v1
Date: Thu, 10 Oct 2024 23:22:11 GMT
Title: Bilinear MLPs enable weight-based mechanistic interpretability
Authors: Michael T. Pearce, Thomas Dooms, Alice Rigg, Jose M. Oramas, Lee Sharkey,
Abstract summary: Bilinear layers serve as an interpretable drop-in replacement for current activation functions. Weight-based interpretability is viable for understanding deep-learning models.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A mechanistic understanding of how MLPs do computation in deep neural networks remains elusive. Current interpretability work can extract features from hidden activations over an input dataset but generally cannot explain how MLP weights construct features. One challenge is that element-wise nonlinearities introduce higher-order interactions and make it difficult to trace computations through the MLP layer. In this paper, we analyze bilinear MLPs, a type of Gated Linear Unit (GLU) without any element-wise nonlinearity that nevertheless achieves competitive performance. Bilinear MLPs can be fully expressed in terms of linear operations using a third-order tensor, allowing flexible analysis of the weights. Analyzing the spectra of bilinear MLP weights using eigendecomposition reveals interpretable low-rank structure across toy tasks, image classification, and language modeling. We use this understanding to craft adversarial examples, uncover overfitting, and identify small language model circuits directly from the weights alone. Our results demonstrate that bilinear layers serve as an interpretable drop-in replacement for current activation functions and that weight-based interpretability is viable for understanding deep-learning models.

Related papers

Do We Really Need GNNs with Explicit Structural Modeling? MLPs Suffice for Language Model Representations [50.45261187796993]
Graph Neural Networks (GNNs) fail to fully utilize structural information, whereas Multi-Layer Perceptrons (MLPs) exhibit a surprising ability in structure-aware tasks.<n>This paper introduces a comprehensive probing framework from an information-theoretic perspective.
arXiv Detail & Related papers (2025-06-26T18:10:28Z)
Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization [17.101290138120564]
Current methods rely on dictionary learning with sparse autoencoders (SAEs)<n>Here, we tackle these limitations by directly decomposing activations with semi-nonnegative matrix factorization (SNMF)<n>Experiments on Llama 3.1, Gemma 2 and GPT-2 show that SNMF derived features outperform SAEs and a strong supervised baseline (difference-in-means) on causal steering.
arXiv Detail & Related papers (2025-06-12T17:33:29Z)
Large Language Models are Locally Linear Mappings [0.0]
We map the inference operations of several open-weight large language models to an exactly equivalent linear system for an input sequence.<n>Despite their power and global nonlinearity, modern LLMs can be interpreted through nearly-exact locally linear decompositions.
arXiv Detail & Related papers (2025-05-30T07:08:33Z)
MAPS: Advancing Multi-Modal Reasoning in Expert-Level Physical Science [62.96434290874878]
Current Multi-Modal Large Language Models (MLLM) have shown strong capabilities in general visual reasoning tasks. We develop a new framework, named Multi-Modal Scientific Reasoning with Physics Perception and Simulation (MAPS) based on an MLLM. MAPS decomposes expert-level multi-modal reasoning task into physical diagram understanding via a Physical Perception Model (PPM) and reasoning with physical knowledge via a simulator.
arXiv Detail & Related papers (2025-01-18T13:54:00Z)
Extrapolative ML Models for Copolymers [1.901715290314837]
Machine learning models have been progressively used for predicting materials properties. These models are inherently interpolative, and their efficacy for searching candidates outside a material's known range of property is unresolved. Here, we determine the relationship between the extrapolation ability of an ML model, the size and range of its training dataset, and its learning approach.
arXiv Detail & Related papers (2024-09-15T11:02:01Z)
Interpreting and Improving Large Language Models in Arithmetic Calculation [72.19753146621429]
Large language models (LLMs) have demonstrated remarkable potential across numerous applications. In this work, we delve into uncovering a specific mechanism by which LLMs execute calculations. We investigate the potential benefits of selectively fine-tuning these essential heads/MLPs to boost the LLMs' computational performance.
arXiv Detail & Related papers (2024-09-03T07:01:46Z)
Weight-based Decomposition: A Case for Bilinear MLPs [0.0]
Gated Linear Units (GLUs) have become a common building block in modern foundation models. Bilinear layers drop the non-linearity in the "gate" but still have comparable performance to other GLUs. We develop a method to decompose the bilinear tensor into a set of interacting eigenvectors.
arXiv Detail & Related papers (2024-06-06T10:46:51Z)
Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers. We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models. Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z)
MLPs Compass: What is learned when MLPs are combined with PLMs? [20.003022732050994]
Multilayer-Perceptrons (MLPs) modules achieving robust structural capture capabilities, even outperforming Graph Neural Networks (GNNs) This paper aims to quantify whether simples can further enhance the already potent ability of PLMs to capture linguistic information.
arXiv Detail & Related papers (2024-01-03T11:06:01Z)
Quantifying the Optimization and Generalization Advantages of Graph Neural Networks Over Multilayer Perceptrons [50.33260238739837]
Graph networks (GNNs) have demonstrated remarkable capabilities in learning from graph-structured data.<n>There remains a lack of analysis comparing GNNs and generalizations from an optimization and generalization perspective.
arXiv Detail & Related papers (2023-06-24T10:21:11Z)
A technical note on bilinear layers for interpretability [0.0]
Bilinear layers are a type of layer that are mathematically much easier to analyze. We can integrate this expression for bilinear layers into a mathematical framework for transformer circuits.
arXiv Detail & Related papers (2023-05-05T11:56:26Z)
Understanding the Role of Nonlinearity in Training Dynamics of Contrastive Learning [37.27098255569438]
We study the role of nonlinearity in the training dynamics of contrastive learning (CL) on one and two-layer nonlinear networks. We show that the presence of nonlinearity leads to many local optima even in 1-layer setting. For 2-layer setting, we also discover emphglobal modulation: those local patterns discriminative from the perspective of global-level patterns are prioritized to learn.
arXiv Detail & Related papers (2022-06-02T23:52:35Z)
Efficient Language Modeling with Sparse all-MLP [53.81435968051093]
All-MLPs can match Transformers in language modeling, but still lag behind in downstream tasks. We propose sparse all-MLPs with mixture-of-experts (MoEs) in both feature and input (tokens) We evaluate its zero-shot in-context learning performance on six downstream tasks, and find that it surpasses Transformer-based MoEs and dense Transformers.
arXiv Detail & Related papers (2022-03-14T04:32:19Z)
Geometric and Physical Quantities improve E(3) Equivariant Message Passing [59.98327062664975]
We introduce Steerable E(3) Equivariant Graph Neural Networks (SEGNNs) that generalise equivariant graph networks. This model, composed of steerables, is able to incorporate geometric and physical information in both the message and update functions. We demonstrate the effectiveness of our method on several tasks in computational physics and chemistry.
arXiv Detail & Related papers (2021-10-06T16:34:26Z)
How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution. Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z)
Nonlinear ISA with Auxiliary Variables for Learning Speech Representations [51.9516685516144]
We introduce a theoretical framework for nonlinear Independent Subspace Analysis (ISA) in the presence of auxiliary variables. We propose an algorithm that learns unsupervised speech representations whose subspaces are independent.
arXiv Detail & Related papers (2020-07-25T14:53:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.