Related papers: Weight-based Decomposition: A Case for Bilinear MLPs

Weight-based Decomposition: A Case for Bilinear MLPs

URL: http://arxiv.org/abs/2406.03947v1
Date: Thu, 6 Jun 2024 10:46:51 GMT
Title: Weight-based Decomposition: A Case for Bilinear MLPs
Authors: Michael T. Pearce, Thomas Dooms, Alice Rigg,
Abstract summary: Gated Linear Units (GLUs) have become a common building block in modern foundation models. Bilinear layers drop the non-linearity in the "gate" but still have comparable performance to other GLUs. We develop a method to decompose the bilinear tensor into a set of interacting eigenvectors.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Gated Linear Units (GLUs) have become a common building block in modern foundation models. Bilinear layers drop the non-linearity in the "gate" but still have comparable performance to other GLUs. An attractive quality of bilinear layers is that they can be fully expressed in terms of a third-order tensor and linear operations. Leveraging this, we develop a method to decompose the bilinear tensor into a set of sparsely interacting eigenvectors that show promising interpretability properties in preliminary experiments for shallow image classifiers (MNIST) and small language models (Tiny Stories). Since the decomposition is fully equivalent to the model's original computations, bilinear layers may be an interpretability-friendly architecture that helps connect features to the model weights. Application of our method may not be limited to pretrained bilinear models since we find that language models such as TinyLlama-1.1B can be finetuned into bilinear variants.

Related papers

Bilinear Convolution Decomposition for Causal RL Interpretability [0.0]
Efforts to interpret reinforcement learning (RL) models often rely on high-level techniques such as attribution or probing. This work proposes replacing nonlinearities in convolutional neural networks (ConvNets) with bilinear variants, to produce a class of models for which these limitations can be addressed. We show bilinear model variants perform comparably in model-free reinforcement learning settings, and give a side by side comparison on ProcGen environments.
arXiv Detail & Related papers (2024-12-01T19:32:04Z)
Bilinear MLPs enable weight-based mechanistic interpretability [0.0]
Bilinear layers serve as an interpretable drop-in replacement for current activation functions. Weight-based interpretability is viable for understanding deep-learning models.
arXiv Detail & Related papers (2024-10-10T23:22:11Z)
Scaling Laws for Linear Complexity Language Models [18.787664489713332]
We present the scaling laws for linear complexity language models to establish a foundation for their scalability. The study reveals that existing linear complexity language models exhibit similar scaling capabilities as conventional transformer-based models.
arXiv Detail & Related papers (2024-06-24T14:51:31Z)
Layered Models can "Automatically" Regularize and Discover Low-Dimensional Structures via Feature Learning [6.109362130047454]
We study a two-layer nonparametric regression model where the input undergoes a linear transformation followed by a nonlinear mapping to predict the output. We show that the two-layer model can "automatically" induce regularization and facilitate feature learning.
arXiv Detail & Related papers (2023-10-18T06:15:35Z)
A technical note on bilinear layers for interpretability [0.0]
Bilinear layers are a type of layer that are mathematically much easier to analyze. We can integrate this expression for bilinear layers into a mathematical framework for transformer circuits.
arXiv Detail & Related papers (2023-05-05T11:56:26Z)
BELIEF in Dependence: Leveraging Atomic Linearity in Data Bits for Rethinking Generalized Linear Models [6.435660232678891]
We develop a framework called binary expansion linear effect (BELIEF) for understanding arbitrary relationships with a binary outcome. Models from the BELIEF framework are easily interpretable because they describe the association of binary variables in the language of linear models.
arXiv Detail & Related papers (2022-10-19T19:28:09Z)
Graph Polynomial Convolution Models for Node Classification of Non-Homophilous Graphs [52.52570805621925]
We investigate efficient learning from higher-order graph convolution and learning directly from adjacency matrix for node classification. We show that the resulting model lead to new graphs and residual scaling parameter. We demonstrate that the proposed methods obtain improved accuracy for node-classification of non-homophilous parameters.
arXiv Detail & Related papers (2022-09-12T04:46:55Z)
Linear Connectivity Reveals Generalization Strategies [54.947772002394736]
Some pairs of finetuned models have large barriers of increasing loss on the linear paths between them. We find distinct clusters of models which are linearly connected on the test loss surface, but are disconnected from models outside the cluster. Our work demonstrates how the geometry of the loss surface can guide models towards different functions.
arXiv Detail & Related papers (2022-05-24T23:43:02Z)
Using Low-rank Representation of Abundance Maps and Nonnegative Tensor Factorization for Hyperspectral Nonlinear Unmixing [28.064111391414773]
We propose a nonlinear low-rank tensor unmixing algorithm to solve the generalized bilinear model (GBM) Specifically, the linear and nonlinear parts of the GBM can both be expressed as tensors. Low-rank structures of abundance maps and nonlinear interaction maps are exploited by minimizing their nuclear norm.
arXiv Detail & Related papers (2021-03-30T09:37:25Z)
Bilinear Classes: A Structural Framework for Provable Generalization in RL [119.42509700822484]
Bilinear Classes is a new structural framework which permits generalization in reinforcement learning. The framework incorporates nearly all existing models in which a sample complexity is achievable. Our main result provides an RL algorithm which has sample complexity for Bilinear Classes.
arXiv Detail & Related papers (2021-03-19T16:34:20Z)
Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature [61.22680308681648]
We show that global convergence is statistically intractable even for one-layer neural net bandit with a deterministic reward. For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent with Online Model Learner (ViOL)
arXiv Detail & Related papers (2021-02-08T12:41:56Z)
Non-parametric Models for Non-negative Functions [48.7576911714538]
We provide the first model for non-negative functions from the same good linear models. We prove that it admits a representer theorem and provide an efficient dual formulation for convex problems.
arXiv Detail & Related papers (2020-07-08T07:17:28Z)
Learning Bijective Feature Maps for Linear ICA [73.85904548374575]
We show that existing probabilistic deep generative models (DGMs) which are tailor-made for image data, underperform on non-linear ICA tasks. To address this, we propose a DGM which combines bijective feature maps with a linear ICA model to learn interpretable latent structures for high-dimensional data. We create models that converge quickly, are easy to train, and achieve better unsupervised latent factor discovery than flow-based models, linear ICA, and Variational Autoencoders on images.
arXiv Detail & Related papers (2020-02-18T17:58:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.