Related papers: Geometric Attention: A Regime-Explicit Operator Semantics for Transformer Attention

Geometric Attention: A Regime-Explicit Operator Semantics for Transformer Attention

URL: http://arxiv.org/abs/2601.11618v1
Date: Sat, 10 Jan 2026 13:43:01 GMT
Title: Geometric Attention: A Regime-Explicit Operator Semantics for Transformer Attention
Authors: Luis Rosario Freytes,
Abstract summary: Geometric Attention (GA) specifies an attention layer by four independent inputs.<n>GA supports multihead/mixed kernels, plan-based anchors, and unary operators as explicit regime choices.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Geometric Attention (GA) specifies an attention layer by four independent inputs: a finite carrier (what indices are addressable), an evidence-kernel rule (how masked proto-scores and a link induce nonnegative weights), a probe family (which observables are treated as admissible), and an anchor/update rule (which representative kernel is selected and how it is applied). Probe families induce an operational equivalence relation on kernels and therefore a gauge; anchors select representatives relative to that probe. Under a scalar relational-work representation and a multiplicative compositionality law for evidence, the admissible link family is exponential, yielding Gibbs weights; with row anchoring this includes the softmax kernel family as a subregime. After quotienting unary row/column score fields, the remaining interaction component admits a canonical rank-r normal form (Eckart-Young/SVD); dot-product score charts implement the corresponding low-rank interaction regime. Fixing the carrier and extensionalizing the update yields the standard fixed-token Transformer attention operator; allowing carrier updates yields adaptive-carrier and staged-depth regimes. The operator language also supports multihead/mixed kernels, plan-based anchors (e.g., entropic OT/Sinkhorn), and unary operators (e.g., FFN-style fields) as explicit regime choices. This separates invariant structure from modeling choice, enabling principled comparison and extension of attention mechanisms, and attention-based architectures.

Related papers

Variational Bayesian Flow Network for Graph Generation [54.94088904387278]
We propose Variational Bayesian Flow Network (VBFN) for graph generation.<n>VBFN performs variational lifting to a tractable joint Gaussian variational belief family governed by structured precisions.<n>On synthetic and molecular graph datasets, VBFN improves fidelity and diversity, and surpasses baseline methods.
arXiv Detail & Related papers (2026-01-30T03:59:38Z)
SMKC: Sketch Based Kernel Correlation Images for Variable Cardinality Time Series Anomaly Detection [0.0]
In operational environments, monitoring systems frequently experience sensor churn.<n>We propose SMKC, a framework that decouples the dynamic input structure from the anomaly detector.<n>We find that a detector using random projections and nearest neighbors on the SMKC representation performs competitively with fully trained baselines.
arXiv Detail & Related papers (2026-01-28T21:15:11Z)
Operationally induced preferred basis in unitary quantum mechanics [0.0]
The preferred-basis problem and the definite-outcome aspect of the measurement problem persist even if the detector is modeled unitarily.<n>A change of mathematical type constitutes the core of the 'cut': a structurally necessary interface from group-based kinematics to set-based counting.
arXiv Detail & Related papers (2026-01-26T17:22:03Z)
Gradient Dynamics of Attention: How Cross-Entropy Sculpts Bayesian Manifolds [0.4779196219827507]
We show how cross-entropy training reshapes attention scores and value vectors in a transformer attention head.<n>Our core result is an emphadvantage-based routing law for attention scores.<n>We show that this coupled specialization behaves like a two-timescale EM procedure.
arXiv Detail & Related papers (2025-12-27T05:31:44Z)
Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space. We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z)
Operator Systems Generated by Projections [3.8073142980733]
We construct a family of operator systems and $k$-AOU spaces generated by a finite number of projections satisfying a set of linear relations. By choosing the linear relations to be the nonsignalling relations from quantum correlation theory, we obtain a hierarchy of ordered vector spaces dual to the hierarchy of quantum correlation sets.
arXiv Detail & Related papers (2023-02-25T01:33:39Z)
Transformer for Partial Differential Equations' Operator Learning [0.0]
We present an attention-based framework for data-driven operator learning, which we term Operator Transformer (OFormer) Our framework is built upon self-attention, cross-attention, and a set of point-wise multilayer perceptrons (MLPs)
arXiv Detail & Related papers (2022-05-26T23:17:53Z)
Frame Averaging for Invariant and Equivariant Network Design [50.87023773850824]
We introduce Frame Averaging (FA), a framework for adapting known (backbone) architectures to become invariant or equivariant to new symmetry types. We show that FA-based models have maximal expressive power in a broad setting. We propose a new class of universal Graph Neural Networks (GNNs), universal Euclidean motion invariant point cloud networks, and Euclidean motion invariant Message Passing (MP) GNNs.
arXiv Detail & Related papers (2021-10-07T11:05:23Z)
Tensor Representations for Action Recognition [54.710267354274194]
Human actions in sequences are characterized by the complex interplay between spatial features and their temporal dynamics. We propose novel tensor representations for capturing higher-order relationships between visual features for the task of action recognition. We use higher-order tensors and so-called Eigenvalue Power Normalization (NEP) which have been long speculated to perform spectral detection of higher-order occurrences.
arXiv Detail & Related papers (2020-12-28T17:27:18Z)
Assignment Flows for Data Labeling on Graphs: Convergence and Stability [69.68068088508505]
This paper establishes conditions on the weight parameters that guarantee convergence of the continuous-time assignment flow to integral assignments (labelings) Several counter-examples illustrate that violating the conditions may entail unfavorable behavior of the assignment flow regarding contextual data classification.
arXiv Detail & Related papers (2020-02-26T15:45:38Z)
Conditional Self-Attention for Query-based Summarization [49.616774159367516]
We propose textitconditional self-attention (CSA), a neural network module designed for conditional dependency modeling. Experiments on Debatepedia and HotpotQA benchmark datasets show CSA consistently outperforms vanilla Transformer.
arXiv Detail & Related papers (2020-02-18T02:22:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.