Geometry of Lightning Self-Attention: Identifiability and Dimension
- URL: http://arxiv.org/abs/2408.17221v1
- Date: Fri, 30 Aug 2024 12:00:36 GMT
- Title: Geometry of Lightning Self-Attention: Identifiability and Dimension
- Authors: Nathan W. Henry, Giovanni Luca Marchetti, Kathlén Kohn,
- Abstract summary: We study the identifiability of deep attention by providing a description of the generic fibers of the parametrization for an arbitrary number of layers.
For a single-layer model, we characterize the singular and boundary points.
Finally, we formulate a conjectural extension of our results to normalized self-attention networks, prove it for a single layer, and numerically verify it in the deep case.
- Score: 2.9816332334719773
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider function spaces defined by self-attention networks without normalization, and theoretically analyze their geometry. Since these networks are polynomial, we rely on tools from algebraic geometry. In particular, we study the identifiability of deep attention by providing a description of the generic fibers of the parametrization for an arbitrary number of layers and, as a consequence, compute the dimension of the function space. Additionally, for a single-layer model, we characterize the singular and boundary points. Finally, we formulate a conjectural extension of our results to normalized self-attention networks, prove it for a single layer, and numerically verify it in the deep case.
Related papers
- On the Geometry and Optimization of Polynomial Convolutional Networks [2.9816332334719773]
We study convolutional neural networks with monomial activation functions.
We compute the dimension and the degree of the neuromanifold, which measure the expressivity of the model.
For a generic large dataset, we derive an explicit formula that quantifies the number of critical points arising in the optimization of a regression loss.
arXiv Detail & Related papers (2024-10-01T14:13:05Z) - Data Topology-Dependent Upper Bounds of Neural Network Widths [52.58441144171022]
We first show that a three-layer neural network can be designed to approximate an indicator function over a compact set.
This is then extended to a simplicial complex, deriving width upper bounds based on its topological structure.
We prove the universal approximation property of three-layer ReLU networks using our topological approach.
arXiv Detail & Related papers (2023-05-25T14:17:15Z) - Differential geometry with extreme eigenvalues in the positive
semidefinite cone [1.9116784879310025]
We present a route to a scalable geometric framework for the analysis and processing of SPD-valued data based on the efficient of extreme generalized eigenvalues.
We define a novel iterative mean of SPD matrices based on this geometry and prove its existence and uniqueness for a given finite collection of points.
arXiv Detail & Related papers (2023-04-14T18:37:49Z) - Function Space and Critical Points of Linear Convolutional Networks [4.483341215742946]
We study the geometry of linear networks with one-dimensional convolutional layers.
We analyze the impact of the network's architecture on the function space's dimension, boundary, and singular points.
arXiv Detail & Related papers (2023-04-12T10:15:17Z) - Towards a mathematical understanding of learning from few examples with
nonlinear feature maps [68.8204255655161]
We consider the problem of data classification where the training set consists of just a few data points.
We reveal key relationships between the geometry of an AI model's feature space, the structure of the underlying data distributions, and the model's generalisation capabilities.
arXiv Detail & Related papers (2022-11-07T14:52:58Z) - Intrinsic dimension estimation for discrete metrics [65.5438227932088]
In this letter we introduce an algorithm to infer the intrinsic dimension (ID) of datasets embedded in discrete spaces.
We demonstrate its accuracy on benchmark datasets, and we apply it to analyze a metagenomic dataset for species fingerprinting.
This suggests that evolutive pressure acts on a low-dimensional manifold despite the high-dimensionality of sequences' space.
arXiv Detail & Related papers (2022-07-20T06:38:36Z) - Holographic properties of superposed quantum geometries [0.0]
We study the holographic properties of a class of quantum geometry states characterized by a superposition of discrete geometric data.
This class includes spin networks, the kinematic states of lattice gauge theory and discrete quantum gravity.
arXiv Detail & Related papers (2022-07-15T17:37:47Z) - Superposed Random Spin Tensor Networks and their Holographic Properties [0.0]
We study boundary-to-boundary holography in a class of spin network states defined by analogy to projected entangled pair states (PEPS)
We consider superpositions of states corresponding to well-defined, discrete geometries on a graph.
arXiv Detail & Related papers (2022-05-19T12:24:57Z) - A singular Riemannian geometry approach to Deep Neural Networks I.
Theoretical foundations [77.86290991564829]
Deep Neural Networks are widely used for solving complex problems in several scientific areas, such as speech recognition, machine translation, image analysis.
We study a particular sequence of maps between manifold, with the last manifold of the sequence equipped with a Riemannian metric.
We investigate the theoretical properties of the maps of such sequence, eventually we focus on the case of maps between implementing neural networks of practical interest.
arXiv Detail & Related papers (2021-12-17T11:43:30Z) - Primal-Dual Mesh Convolutional Neural Networks [62.165239866312334]
We propose a primal-dual framework drawn from the graph-neural-network literature to triangle meshes.
Our method takes features for both edges and faces of a 3D mesh as input and dynamically aggregates them.
We provide theoretical insights of our approach using tools from the mesh-simplification literature.
arXiv Detail & Related papers (2020-10-23T14:49:02Z) - Convex Geometry and Duality of Over-parameterized Neural Networks [70.15611146583068]
We develop a convex analytic approach to analyze finite width two-layer ReLU networks.
We show that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set.
In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints.
arXiv Detail & Related papers (2020-02-25T23:05:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.