Related papers: Group Representational Position Encoding

Group Representational Position Encoding

URL: http://arxiv.org/abs/2512.07805v1
Date: Mon, 08 Dec 2025 18:39:13 GMT
Title: Group Representational Position Encoding
Authors: Yifan Zhang, Zixiang Chen, Yifeng Liu, Zhen Qin, Huizhuo Yuan, Kangping Xu, Yang Yuan, Quanquan Gu, Andrew Chi-Chih Yao,
Abstract summary: We present GRAPE, a unified framework for positional encoding based on group actions.<n>Two families of mechanisms: (i) multiplicative rotations (Multiplicative GRAPE) in $mathrmSO(d)$ and (ii) additive logit biases (Additive GRAPE) arising from unipotent actions in the general linear group $mathrmGL$.
Score: 66.33026480082025
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present GRAPE (Group RepresentAtional Position Encoding), a unified framework for positional encoding based on group actions. GRAPE brings together two families of mechanisms: (i) multiplicative rotations (Multiplicative GRAPE) in $\mathrm{SO}(d)$ and (ii) additive logit biases (Additive GRAPE) arising from unipotent actions in the general linear group $\mathrm{GL}$. In Multiplicative GRAPE, a position $n \in \mathbb{Z}$ (or $t \in \mathbb{R}$) acts as $\mathbf{G}(n)=\exp(n\,ω\,\mathbf{L})$ with a rank-2 skew generator $\mathbf{L} \in \mathbb{R}^{d \times d}$, yielding a relative, compositional, norm-preserving map with a closed-form matrix exponential. RoPE is recovered exactly when the $d/2$ planes are the canonical coordinate pairs with log-uniform spectrum. Learned commuting subspaces and compact non-commuting mixtures strictly extend this geometry to capture cross-subspace feature coupling at $O(d)$ and $O(r d)$ cost per head, respectively. In Additive GRAPE, additive logits arise as rank-1 (or low-rank) unipotent actions, recovering ALiBi and the Forgetting Transformer (FoX) as exact special cases while preserving an exact relative law and streaming cacheability. Altogether, GRAPE supplies a principled design space for positional geometry in long-context models, subsuming RoPE and ALiBi as special cases. Project Page: https://github.com/model-architectures/GRAPE.

Related papers

Group theoretic quantization of punctured plane [0.0]
We establish an algebra homomorphism between the Lie algebra corresponding to the canonical group, $mathscrG = R2 rtimes (SO(2)times R+)$.<n>We deduce a quantization map that maps a subspace of classical observables, $fin Cinfty(M)$, to self-adjoint operators on the Hilbert space, $mathscrH$.
arXiv Detail & Related papers (2025-10-28T21:54:37Z)
Learning Lie Group Generators from Trajectories [0.0]
This work investigates the inverse problem of generator recovery in matrix Lie groups from discretized trajectories.<n>A feedforward neural network is trained to learn this mapping across several groups.<n>It demonstrates strong empirical accuracy under both clean and noisy conditions.
arXiv Detail & Related papers (2025-04-04T07:08:59Z)
Global law of conjugate kernel random matrices with heavy-tailed weights [1.8416014644193066]
We study the spectral behavior of the conjugate kernel random matrix $YYtop$, where $Y= f(WX)$ arises from a two-layer neural network model.<n>We show that heavy-tailed weights induce strong correlations between the entries of $Y$, leading to richer and fundamentally different spectral behavior compared to models with light-tailed weights.
arXiv Detail & Related papers (2025-02-25T18:22:58Z)
Learning Orthogonal Multi-Index Models: A Fine-Grained Information Exponent Analysis [54.57279006229212]
Information exponent has played an important role in predicting the sample complexity of online gradient descent.<n>In this work, we show that by considering both second- and higher-order terms, we can first learn the relevant space using the second-order terms.<n>The overall sample and complexity of online SGD is $tildeO( d PL-1 )$.
arXiv Detail & Related papers (2024-10-13T00:14:08Z)
Quantum geometric Wigner construction for $D(G)$ and braided racks [0.0]
A quantum double $D(G)=Bbb C(G)rtimes Bbb C G$ of a finite group plays an important role in the Kitaev model for quantum computing. We interpret the known construction of its irreps, which are quasiparticles for the model, in a geometric manner strictly analogous to the Wigner construction for the usual Poincar'e group of $Bbb R1,3$.
arXiv Detail & Related papers (2024-07-16T15:21:28Z)
Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models. In this work, we initiate the study of provably learning a multi-head attention layer from random examples. We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z)
A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously. Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples. We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z)
The Hurwitz-Hopf Map and Harmonic Wave Functions for Integer and Half-Integer Angular Momentum [0.0]
Harmonic wave functions for integer and half-integer angular momentum are given in terms of the angles $(theta,phi,psi)$ that define a rotation in $SO(3)$. A new nonrelistic quantum (Schr"odinger-like) equation for the hydrogen atom that takes into account the electron spin is introduced.
arXiv Detail & Related papers (2022-11-19T19:13:07Z)
Algebraic Aspects of Boundaries in the Kitaev Quantum Double Model [77.34726150561087]
We provide a systematic treatment of boundaries based on subgroups $Ksubseteq G$ with the Kitaev quantum double $D(G)$ model in the bulk. The boundary sites are representations of a $*$-subalgebra $Xisubseteq D(G)$ and we explicate its structure as a strong $*$-quasi-Hopf algebra. As an application of our treatment, we study patches with boundaries based on $K=G$ horizontally and $K=e$ vertically and show how these could be used in a quantum computer
arXiv Detail & Related papers (2022-08-12T15:05:07Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations. The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.