On the Anatomy of Attention
- URL: http://arxiv.org/abs/2407.02423v2
- Date: Sun, 7 Jul 2024 17:03:05 GMT
- Title: On the Anatomy of Attention
- Authors: Nikhil Khatri, Tuomas Laakkonen, Jonathon Liu, Vincent Wang-MaĆcianica,
- Abstract summary: We introduce a category-theoretic diagrammatic formalism in order to systematically relate and reason about machine learning models.
Our diagrams present architectures intuitively but without loss of essential detail, where natural relationships between models are captured by graphical transformations.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a category-theoretic diagrammatic formalism in order to systematically relate and reason about machine learning models. Our diagrams present architectures intuitively but without loss of essential detail, where natural relationships between models are captured by graphical transformations, and important differences and similarities can be identified at a glance. In this paper, we focus on attention mechanisms: translating folklore into mathematical derivations, and constructing a taxonomy of attention variants in the literature. As a first example of an empirical investigation underpinned by our formalism, we identify recurring anatomical components of attention, which we exhaustively recombine to explore a space of variations on the attention mechanism.
Related papers
- Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient [0.49478969093606673]
We introduce refined variants of the Local Learning Coefficient (LLC), a measure of model complexity grounded in singular learning theory.
We study the development of internal structure in transformer language models during training.
arXiv Detail & Related papers (2024-10-03T20:51:02Z) - From Cognition to Computation: A Comparative Review of Human Attention and Transformer Architectures [1.5266118210763295]
Recent developments in artificial intelligence like the Transformer architecture incorporate the idea of attention in model designs.
Our review aims to provide a comparative analysis of these mechanisms from a cognitive-functional perspective.
arXiv Detail & Related papers (2024-04-25T05:13:38Z) - Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement [58.9768112704998]
Disentangled representation learning strives to extract the intrinsic factors within observed data.
We introduce a new perspective and framework, demonstrating that diffusion models with cross-attention can serve as a powerful inductive bias.
This is the first work to reveal the potent disentanglement capability of diffusion models with cross-attention, requiring no complex designs.
arXiv Detail & Related papers (2024-02-15T05:07:54Z) - Binding Dynamics in Rotating Features [72.80071820194273]
We propose an alternative "cosine binding" mechanism, which explicitly computes the alignment between features and adjusts weights accordingly.
This allows us to draw direct connections to self-attention and biological neural processes, and to shed light on the fundamental dynamics for object-centric representations to emerge in Rotating Features.
arXiv Detail & Related papers (2024-02-08T12:31:08Z) - Discrete, compositional, and symbolic representations through attractor dynamics [51.20712945239422]
We introduce a novel neural systems model that integrates attractor dynamics with symbolic representations to model cognitive processes akin to the probabilistic language of thought (PLoT)
Our model segments the continuous representational space into discrete basins, with attractor states corresponding to symbolic sequences, that reflect the semanticity and compositionality characteristic of symbolic systems through unsupervised learning, rather than relying on pre-defined primitives.
This approach establishes a unified framework that integrates both symbolic and sub-symbolic processing through neural dynamics, a neuroplausible substrate with proven expressivity in AI, offering a more comprehensive model that mirrors the complex duality of cognitive operations
arXiv Detail & Related papers (2023-10-03T05:40:56Z) - AttentionViz: A Global View of Transformer Attention [60.82904477362676]
We present a new visualization technique designed to help researchers understand the self-attention mechanism in transformers.
The main idea behind our method is to visualize a joint embedding of the query and key vectors used by transformer models to compute attention.
We create an interactive visualization tool, AttentionViz, based on these joint query-key embeddings.
arXiv Detail & Related papers (2023-05-04T23:46:49Z) - A General Survey on Attention Mechanisms in Deep Learning [7.5537115673774275]
This survey provides an overview of the most important attention mechanisms proposed in the literature.
The various attention mechanisms are explained by means of a framework consisting of a general attention model, uniform notation, and a comprehensive taxonomy of attention mechanisms.
arXiv Detail & Related papers (2022-03-27T10:06:23Z) - Self-Supervised Graph Representation Learning for Neuronal Morphologies [75.38832711445421]
We present GraphDINO, a data-driven approach to learn low-dimensional representations of 3D neuronal morphologies from unlabeled datasets.
We show, in two different species and across multiple brain areas, that this method yields morphological cell type clusterings on par with manual feature-based classification by experts.
Our method could potentially enable data-driven discovery of novel morphological features and cell types in large-scale datasets.
arXiv Detail & Related papers (2021-12-23T12:17:47Z) - On the Dynamics of Training Attention Models [30.85940880569692]
We study the dynamics of training a simple attention-based classification model using gradient descent.
We prove that training must converge to attending to the discriminative words when the attention output is classified by a linear classifier.
arXiv Detail & Related papers (2020-11-19T18:55:30Z) - How Far Does BERT Look At:Distance-based Clustering and Analysis of
BERT$'$s Attention [20.191319097826266]
We cluster attention heatmaps into significantly different patterns through unsupervised clustering.
Our proposed features can be used to explain and calibrate different attention heads in Transformer models.
arXiv Detail & Related papers (2020-11-02T12:52:31Z) - Explain and Improve: LRP-Inference Fine-Tuning for Image Captioning
Models [82.3793660091354]
This paper analyzes the predictions of image captioning models with attention mechanisms beyond visualizing the attention itself.
We develop variants of layer-wise relevance propagation (LRP) and gradient-based explanation methods, tailored to image captioning models with attention mechanisms.
arXiv Detail & Related papers (2020-01-04T05:15:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.