A phase transition between positional and semantic learning in a
solvable model of dot-product attention
- URL: http://arxiv.org/abs/2402.03902v1
- Date: Tue, 6 Feb 2024 11:13:54 GMT
- Title: A phase transition between positional and semantic learning in a
solvable model of dot-product attention
- Authors: Hugo Cui, Freya Behrens, Florent Krzakala, Lenka Zdeborov\'a
- Abstract summary: We show how a dot-product attention layer learns a positional attention matrix and a semantic attention matrix.
For an algorithmic task, we experimentally show how the same simple architecture can learn using either the positional or semantic mechanism.
- Score: 20.83573496458023
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate how a dot-product attention layer learns a positional
attention matrix (with tokens attending to each other based on their respective
positions) and a semantic attention matrix (with tokens attending to each other
based on their meaning). For an algorithmic task, we experimentally show how
the same simple architecture can learn to implement a solution using either the
positional or semantic mechanism. On the theoretical side, we study the
learning of a non-linear self-attention layer with trainable tied and low-rank
query and key matrices. In the asymptotic limit of high-dimensional data and a
comparably large number of training samples, we provide a closed-form
characterization of the global minimum of the non-convex empirical loss
landscape. We show that this minimum corresponds to either a positional or a
semantic mechanism and evidence an emergent phase transition from the former to
the latter with increasing sample complexity. Finally, we compare the
dot-product attention layer to linear positional baseline, and show that it
outperforms the latter using the semantic mechanism provided it has access to
sufficient data.
Related papers
- Connectivity Shapes Implicit Regularization in Matrix Factorization Models for Matrix Completion [2.8948274245812335]
We investigate the implicit regularization of matrix factorization for solving matrix completion problems.
We empirically discover that the connectivity of observed data plays a crucial role in the implicit bias.
Our work reveals the intricate interplay between data connectivity, training dynamics, and implicit regularization in matrix factorization models.
arXiv Detail & Related papers (2024-05-22T15:12:14Z) - Bayesian Inference of Transition Matrices from Incomplete Graph Data
with a Topological Prior [1.2891210250935143]
We derive an analytically tractable Bayesian method that uses repeated interactions and a topological prior to infer transition matrices data-efficiently.
We show that it recovers the transition probabilities with higher accuracy and that it is robust even in cases when the knowledge of the topological constraint is partial.
arXiv Detail & Related papers (2022-10-27T13:17:47Z) - MARS: Meta-Learning as Score Matching in the Function Space [79.73213540203389]
We present a novel approach to extracting inductive biases from a set of related datasets.
We use functional Bayesian neural network inference, which views the prior as a process and performs inference in the function space.
Our approach can seamlessly acquire and represent complex prior knowledge by metalearning the score function of the data-generating process.
arXiv Detail & Related papers (2022-10-24T15:14:26Z) - Deep Equilibrium Assisted Block Sparse Coding of Inter-dependent
Signals: Application to Hyperspectral Imaging [71.57324258813675]
A dataset of inter-dependent signals is defined as a matrix whose columns demonstrate strong dependencies.
A neural network is employed to act as structure prior and reveal the underlying signal interdependencies.
Deep unrolling and Deep equilibrium based algorithms are developed, forming highly interpretable and concise deep-learning-based architectures.
arXiv Detail & Related papers (2022-03-29T21:00:39Z) - Provably End-to-end Label-Noise Learning without Anchor Points [118.97592870124937]
We propose an end-to-end framework for solving label-noise learning without anchor points.
Our proposed framework can identify the transition matrix if the clean class-posterior probabilities are sufficiently scattered.
arXiv Detail & Related papers (2021-02-04T03:59:37Z) - Semi-Supervised Learning with Meta-Gradient [123.26748223837802]
We propose a simple yet effective meta-learning algorithm in semi-supervised learning.
We find that the proposed algorithm performs favorably against state-of-the-art methods.
arXiv Detail & Related papers (2020-07-08T08:48:56Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z) - Unsupervised phase discovery with deep anomaly detection [0.0]
We demonstrate how to explore phase diagrams with automated and unsupervised machine learning.
We employ deep neural networks to determine the entire phase diagram in a completely unsupervised and automated fashion.
Our method allows us to reveal a phase-separated region between supersolid and superfluid parts with unexpected properties.
arXiv Detail & Related papers (2020-03-22T14:20:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.