Approximation of relation functions and attention mechanisms
- URL: http://arxiv.org/abs/2402.08856v2
- Date: Sat, 15 Jun 2024 20:03:56 GMT
- Title: Approximation of relation functions and attention mechanisms
- Authors: Awni Altabaa, John Lafferty,
- Abstract summary: Inner products of neural network feature maps arise as a method of modeling relations between inputs.
This work studies the approximation properties of inner products of neural networks.
- Score: 2.5322020135765464
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inner products of neural network feature maps arise in a wide variety of machine learning frameworks as a method of modeling relations between inputs. This work studies the approximation properties of inner products of neural networks. It is shown that the inner product of a multi-layer perceptron with itself is a universal approximator for symmetric positive-definite relation functions. In the case of asymmetric relation functions, it is shown that the inner product of two different multi-layer perceptrons is a universal approximator. In both cases, a bound is obtained on the number of neurons required to achieve a given accuracy of approximation. In the symmetric case, the function class can be identified with kernels of reproducing kernel Hilbert spaces, whereas in the asymmetric case the function class can be identified with kernels of reproducing kernel Banach spaces. Finally, these approximation results are applied to analyzing the attention mechanism underlying Transformers, showing that any retrieval mechanism defined by an abstract preorder can be approximated by attention through its inner product relations. This result uses the Debreu representation theorem in economics to represent preference relations in terms of utility functions.
Related papers
- Programs as Singularities [0.6906005491572401]
We develop a correspondence between the structure of Turing machines and the structure of singularities of real analytic functions.
Our results point to a more nuanced understanding of Occam's razor and the meaning of simplicity in inductive inference.
arXiv Detail & Related papers (2025-04-10T19:04:31Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency [111.83670279016599]
We study reinforcement learning for partially observed decision processes (POMDPs) with infinite observation and state spaces.
We make the first attempt at partial observability and function approximation for a class of POMDPs with a linear structure.
arXiv Detail & Related papers (2022-04-20T21:15:38Z) - Algebraic function based Banach space valued ordinary and fractional
neural network approximations [0.0]
approximations are pointwise and of the uniform norm.
The related Banach space valued feed-forward neural networks are with one hidden layer.
arXiv Detail & Related papers (2022-02-11T20:08:52Z) - Revisiting Memory Efficient Kernel Approximation: An Indefinite Learning
Perspective [0.8594140167290097]
Matrix approximations are a key element in large-scale machine learning approaches.
We extend MEKA to be applicable not only for shift-invariant kernels but also for non-stationary kernels.
We present a Lanczos-based estimation of a spectrum shift to develop a stable positive semi-definite MEKA approximation.
arXiv Detail & Related papers (2021-12-18T10:01:34Z) - Eigen Analysis of Self-Attention and its Reconstruction from Partial
Computation [58.80806716024701]
We study the global structure of attention scores computed using dot-product based self-attention.
We find that most of the variation among attention scores lie in a low-dimensional eigenspace.
We propose to compute scores only for a partial subset of token pairs, and use them to estimate scores for the remaining pairs.
arXiv Detail & Related papers (2021-06-16T14:38:42Z) - A supervised learning algorithm for interacting topological insulators
based on local curvature [6.281776745576886]
We introduce a supervised machine learning scheme that uses only the curvature function at the high symmetry points as input data.
We show that an artificial neural network trained with the noninteracting data can accurately predict all topological phases in the interacting cases.
Intriguingly, the method uncovers a ubiquitous interaction-induced topological quantum multicriticality.
arXiv Detail & Related papers (2021-04-22T18:00:00Z) - Representation Theorem for Matrix Product States [1.7894377200944511]
We investigate the universal representation capacity of the Matrix Product States (MPS) from the perspective of functions and continuous functions.
We show that MPS can accurately realize arbitrary functions by providing a construction method of the corresponding MPS structure for an arbitrarily given gate.
We study the relation between MPS and neural networks and show that the MPS with a scale-invariant sigmoidal function is equivalent to a one-hidden-layer neural network.
arXiv Detail & Related papers (2021-03-15T11:06:54Z) - Adding machine learning within Hamiltonians: Renormalization group
transformations, symmetry breaking and restoration [0.0]
We include the predictive function of a neural network, designed for phase classification, as a conjugate variable coupled to an external field within the Hamiltonian of a system.
Results show that the field can induce an order-disorder phase transition by breaking or restoring the symmetry.
We conclude by discussing how the method provides an essential step toward bridging machine learning and physics.
arXiv Detail & Related papers (2020-09-30T18:44:18Z) - UNIPoint: Universally Approximating Point Processes Intensities [125.08205865536577]
We provide a proof that a class of learnable functions can universally approximate any valid intensity function.
We implement UNIPoint, a novel neural point process model, using recurrent neural networks to parameterise sums of basis function upon each event.
arXiv Detail & Related papers (2020-07-28T09:31:56Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.