Group Representational Position Encoding
- URL: http://arxiv.org/abs/2512.07805v1
- Date: Mon, 08 Dec 2025 18:39:13 GMT
- Title: Group Representational Position Encoding
- Authors: Yifan Zhang, Zixiang Chen, Yifeng Liu, Zhen Qin, Huizhuo Yuan, Kangping Xu, Yang Yuan, Quanquan Gu, Andrew Chi-Chih Yao,
- Abstract summary: We present GRAPE, a unified framework for positional encoding based on group actions.<n>Two families of mechanisms: (i) multiplicative rotations (Multiplicative GRAPE) in $mathrmSO(d)$ and (ii) additive logit biases (Additive GRAPE) arising from unipotent actions in the general linear group $mathrmGL$.
- Score: 66.33026480082025
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present GRAPE (Group RepresentAtional Position Encoding), a unified framework for positional encoding based on group actions. GRAPE brings together two families of mechanisms: (i) multiplicative rotations (Multiplicative GRAPE) in $\mathrm{SO}(d)$ and (ii) additive logit biases (Additive GRAPE) arising from unipotent actions in the general linear group $\mathrm{GL}$. In Multiplicative GRAPE, a position $n \in \mathbb{Z}$ (or $t \in \mathbb{R}$) acts as $\mathbf{G}(n)=\exp(n\,ω\,\mathbf{L})$ with a rank-2 skew generator $\mathbf{L} \in \mathbb{R}^{d \times d}$, yielding a relative, compositional, norm-preserving map with a closed-form matrix exponential. RoPE is recovered exactly when the $d/2$ planes are the canonical coordinate pairs with log-uniform spectrum. Learned commuting subspaces and compact non-commuting mixtures strictly extend this geometry to capture cross-subspace feature coupling at $O(d)$ and $O(r d)$ cost per head, respectively. In Additive GRAPE, additive logits arise as rank-1 (or low-rank) unipotent actions, recovering ALiBi and the Forgetting Transformer (FoX) as exact special cases while preserving an exact relative law and streaming cacheability. Altogether, GRAPE supplies a principled design space for positional geometry in long-context models, subsuming RoPE and ALiBi as special cases. Project Page: https://github.com/model-architectures/GRAPE.
Related papers
- Group theoretic quantization of punctured plane [0.0]
We establish an algebra homomorphism between the Lie algebra corresponding to the canonical group, $mathscrG = R2 rtimes (SO(2)times R+)$.<n>We deduce a quantization map that maps a subspace of classical observables, $fin Cinfty(M)$, to self-adjoint operators on the Hilbert space, $mathscrH$.
arXiv Detail & Related papers (2025-10-28T21:54:37Z) - Learning Lie Group Generators from Trajectories [0.0]
This work investigates the inverse problem of generator recovery in matrix Lie groups from discretized trajectories.<n>A feedforward neural network is trained to learn this mapping across several groups.<n>It demonstrates strong empirical accuracy under both clean and noisy conditions.
arXiv Detail & Related papers (2025-04-04T07:08:59Z) - Global law of conjugate kernel random matrices with heavy-tailed weights [1.8416014644193066]
We study the spectral behavior of the conjugate kernel random matrix $YYtop$, where $Y= f(WX)$ arises from a two-layer neural network model.<n>We show that heavy-tailed weights induce strong correlations between the entries of $Y$, leading to richer and fundamentally different spectral behavior compared to models with light-tailed weights.
arXiv Detail & Related papers (2025-02-25T18:22:58Z) - Learning Orthogonal Multi-Index Models: A Fine-Grained Information Exponent Analysis [54.57279006229212]
Information exponent has played an important role in predicting the sample complexity of online gradient descent.<n>In this work, we show that by considering both second- and higher-order terms, we can first learn the relevant space using the second-order terms.<n>The overall sample and complexity of online SGD is $tildeO( d PL-1 )$.
arXiv Detail & Related papers (2024-10-13T00:14:08Z) - Quantum geometric Wigner construction for $D(G)$ and braided racks [0.0]
A quantum double $D(G)=Bbb C(G)rtimes Bbb C G$ of a finite group plays an important role in the Kitaev model for quantum computing.
We interpret the known construction of its irreps, which are quasiparticles for the model, in a geometric manner strictly analogous to the Wigner construction for the usual Poincar'e group of $Bbb R1,3$.
arXiv Detail & Related papers (2024-07-16T15:21:28Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - The Hurwitz-Hopf Map and Harmonic Wave Functions for Integer and
Half-Integer Angular Momentum [0.0]
Harmonic wave functions for integer and half-integer angular momentum are given in terms of the angles $(theta,phi,psi)$ that define a rotation in $SO(3)$.
A new nonrelistic quantum (Schr"odinger-like) equation for the hydrogen atom that takes into account the electron spin is introduced.
arXiv Detail & Related papers (2022-11-19T19:13:07Z) - Algebraic Aspects of Boundaries in the Kitaev Quantum Double Model [77.34726150561087]
We provide a systematic treatment of boundaries based on subgroups $Ksubseteq G$ with the Kitaev quantum double $D(G)$ model in the bulk.
The boundary sites are representations of a $*$-subalgebra $Xisubseteq D(G)$ and we explicate its structure as a strong $*$-quasi-Hopf algebra.
As an application of our treatment, we study patches with boundaries based on $K=G$ horizontally and $K=e$ vertically and show how these could be used in a quantum computer
arXiv Detail & Related papers (2022-08-12T15:05:07Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.