Related papers: Converting MLPs into Polynomials in Closed Form

Converting MLPs into Polynomials in Closed Form

URL: http://arxiv.org/abs/2502.01032v1
Date: Mon, 03 Feb 2025 03:54:41 GMT
Title: Converting MLPs into Polynomials in Closed Form
Authors: Nora Belrose, Alice Rigg,
Abstract summary: We derive theoretically closed-form least-squares optimal approximations of feedforward networks.<n>We show that quadratic approximants can be used to create SVD-based adversarial examples.
Score: 0.7234862895932991
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent work has shown that purely quadratic functions can replace MLPs in transformers with no significant loss in performance, while enabling new methods of interpretability based on linear algebra. In this work, we theoretically derive closed-form least-squares optimal approximations of feedforward networks (multilayer perceptrons and gated linear units) using polynomial functions of arbitrary degree. When the $R^2$ is high, this allows us to interpret MLPs and GLUs by visualizing the eigendecomposition of the coefficients of their linear and quadratic approximants. We also show that these approximants can be used to create SVD-based adversarial examples. By tracing the $R^2$ of linear and quadratic approximants across training time, we find new evidence that networks start out simple, and get progressively more complex. Even at the end of training, however, our quadratic approximants explain over 95% of the variance in network outputs.

Related papers

Modular addition without black-boxes: Compressing explanations of MLPs that compute numerical integration [1.7679702431368263]
We present the first case study in rigorously compressing nonlinear feature-maps.<n>We target a non-vacuous bound on the behaviour of the ReLU in time linear in the parameter-count of the circuit.
arXiv Detail & Related papers (2024-12-04T23:29:07Z)
Power-Softmax: Towards Secure LLM Inference over Encrypted Data [2.4576879793338913]
Homomorphic Encryption (HE) requires cryptographic methods to have an encrypted form. Previous approaches have either directly approximated pre-trained models with large-degrees over tenfold. We present a new variant of self-attention that offers a stable form for training and is easy to approximate with training.
arXiv Detail & Related papers (2024-10-12T09:32:42Z)
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time [17.086679273053853]
We show that a novel fast approximation method can calculate the gradients in almost linear time. By improving the efficiency of gradient, we hope that this work will facilitate more effective training and deployment of long-context language models.
arXiv Detail & Related papers (2024-08-23T17:16:43Z)
ReLU Fields: The Little Non-linearity That Could [62.228229880658404]
We investigate what is the smallest change to grid-based representations that allows for retaining the high fidelity result ofs. We show that such an approach becomes competitive with the state-of-the-art.
arXiv Detail & Related papers (2022-05-22T13:42:31Z)
Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting [60.98700344526674]
Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning. In this paper, we investigate a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states in a controlled and infrequent manner. We develop an algorithm tailored to this setting, achieving a sample complexity that scales practicallyly with the feature dimension, the horizon, and the inverse sub-optimality gap, but not the size of the state/action space.
arXiv Detail & Related papers (2021-05-17T17:22:07Z)
Bilinear Classes: A Structural Framework for Provable Generalization in RL [119.42509700822484]
Bilinear Classes is a new structural framework which permits generalization in reinforcement learning. The framework incorporates nearly all existing models in which a sample complexity is achievable. Our main result provides an RL algorithm which has sample complexity for Bilinear Classes.
arXiv Detail & Related papers (2021-03-19T16:34:20Z)
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning. In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function. Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z)
FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs [53.710405006523274]
This work focuses on the representation learning question: how can we learn such features? Under the assumption that the underlying (unknown) dynamics correspond to a low rank transition matrix, we show how the representation learning question is related to a particular non-linear matrix decomposition problem. We develop FLAMBE, which engages in exploration and representation learning for provably efficient RL in low rank transition models.
arXiv Detail & Related papers (2020-06-18T19:11:18Z)
A Solution for Large Scale Nonlinear Regression with High Rank and Degree at Constant Memory Complexity via Latent Tensor Reconstruction [0.0]
This paper proposes a novel method for learning highly nonlinear, multivariate functions from examples. Our method takes advantage of the property that continuous functions can be approximated by bys, which in turn are representable by tensors. For learning the models, we present an efficient-based algorithm that can be implemented in linear time.
arXiv Detail & Related papers (2020-05-04T14:49:14Z)
Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-layer Networks [70.15611146583068]
We develop exact representations of training two-layer neural networks with rectified linear units (ReLUs) Our theory utilizes semi-infinite duality and minimum norm regularization.
arXiv Detail & Related papers (2020-02-24T21:32:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.