Grokking Finite-Dimensional Algebra
- URL: http://arxiv.org/abs/2602.19533v1
- Date: Mon, 23 Feb 2026 05:55:52 GMT
- Title: Grokking Finite-Dimensional Algebra
- Authors: Pascal Jr Tikeng Notsawo, Guillaume Dumas, Guillaume Rabusseau,
- Abstract summary: grokking refers to the sudden transition from a long memorization to generalization observed during neural networks training.<n>We show that grokking emerges naturally as models must learn discrete representations of algebraic elements.
- Score: 5.471648649900293
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates the grokking phenomenon, which refers to the sudden transition from a long memorization to generalization observed during neural networks training, in the context of learning multiplication in finite-dimensional algebras (FDA). While prior work on grokking has focused mainly on group operations, we extend the analysis to more general algebraic structures, including non-associative, non-commutative, and non-unital algebras. We show that learning group operations is a special case of learning FDA, and that learning multiplication in FDA amounts to learning a bilinear product specified by the algebra's structure tensor. For algebras over the reals, we connect the learning problem to matrix factorization with an implicit low-rank bias, and for algebras over finite fields, we show that grokking emerges naturally as models must learn discrete representations of algebraic elements. This leads us to experimentally investigate the following core questions: (i) how do algebraic properties such as commutativity, associativity, and unitality influence both the emergence and timing of grokking, (ii) how structural properties of the structure tensor of the FDA, such as sparsity and rank, influence generalization, and (iii) to what extent generalization correlates with the model learning latent embeddings aligned with the algebra's representation. Our work provides a unified framework for grokking across algebraic structures and new insights into how mathematical structure governs neural network generalization dynamics.
Related papers
- Structural Disentanglement in Bilinear MLPs via Architectural Inductive Bias [0.0]
We argue that failures arise from how models structure their internal representations during training.<n>We show analytically that bilinear parameterizations possess a non-mixing' property under gradient flow conditions.<n>Unlike pointwise nonlinear networks, multiplicative architectures are able to recover true operators aligned with the underlying algebraic structure.
arXiv Detail & Related papers (2026-02-05T13:14:01Z) - Product Interaction: An Algebraic Formalism for Deep Learning Architectures [1.1885785138453553]
Product interactions are a formalism in which neural network layers are constructed from compositions of a multiplication operator defined over suitable algebras.<n>Our central observation is that algebraic expressions in modern neural networks admit a unified construction in terms of linear, quadratic, and higher-order product interactions.
arXiv Detail & Related papers (2026-01-31T07:14:01Z) - On the structural properties of Lie algebras via associated labeled directed graphs [0.0]
We present a method for associating labeled directed graphs to finite-dimensional Lie algebras.<n>We analyze properties of valid graphs given the antisymmetry property of the Lie bracket as well as the Jacobi identity.<n>We develop graph-theoretic criteria for solvability, nilpotency, presence of ideals, simplicity, semisimplicity, and reductiveness of an algebra.
arXiv Detail & Related papers (2026-01-22T18:09:16Z) - Multiary gradings [0.0]
We introduce the notion of grading by multiary groups and investigate various compatibility conditions between the arity of algebra operations and grading group operations.<n>The theory reveals fundamentally new phenomena not present in the binary case, such as the existence of higher power gradings and nontrivial constraints on arity compatibility.
arXiv Detail & Related papers (2026-01-16T19:44:27Z) - Why Neural Network Can Discover Symbolic Structures with Gradient-based Training: An Algebraic and Geometric Foundation for Neurosymbolic Reasoning [73.18052192964349]
We develop a theoretical framework that explains how discrete symbolic structures can emerge naturally from continuous neural network training dynamics.<n>By lifting neural parameters to a measure space and modeling training as Wasserstein gradient flow, we show that under geometric constraints, the parameter measure $mu_t$ undergoes two concurrent phenomena.
arXiv Detail & Related papers (2025-06-26T22:40:30Z) - Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods [45.94152084965753]
We establish a novel connection between the attention mechanism and classical kernel methods.<n>We derive generalization error bounds in terms of the prompt length and the number of training tasks.<n>Our result characterizes how the generalization error scales with the number of training tasks.
arXiv Detail & Related papers (2025-06-12T17:56:26Z) - Knowledgebra: An Algebraic Learning Framework for Knowledge Graph [15.235089177507897]
Knowledge graph (KG) representation learning aims to encode entities and relations into dense continuous vector spaces such that knowledge contained in a dataset could be consistently represented.
We developed a mathematical language for KG based on an observation of their inherent algebraic structure, which we termed as Knowledgebra.
We implemented an instantiation model, SemE, using simple matrix semigroups, which exhibits state-of-the-art performance on standard datasets.
arXiv Detail & Related papers (2022-04-15T04:53:47Z) - Learning Algebraic Representation for Systematic Generalization in
Abstract Reasoning [109.21780441933164]
We propose a hybrid approach to improve systematic generalization in reasoning.
We showcase a prototype with algebraic representation for the abstract spatial-temporal task of Raven's Progressive Matrices (RPM)
We show that the algebraic representation learned can be decoded by isomorphism to generate an answer.
arXiv Detail & Related papers (2021-11-25T09:56:30Z) - Learning Algebraic Recombination for Compositional Generalization [71.78771157219428]
We propose LeAR, an end-to-end neural model to learn algebraic recombination for compositional generalization.
Key insight is to model the semantic parsing task as a homomorphism between a latent syntactic algebra and a semantic algebra.
Experiments on two realistic and comprehensive compositional generalization demonstrate the effectiveness of our model.
arXiv Detail & Related papers (2021-07-14T07:23:46Z) - LieTransformer: Equivariant self-attention for Lie Groups [49.9625160479096]
Group equivariant neural networks are used as building blocks of group invariant neural networks.
We extend the scope of the literature to self-attention, that is emerging as a prominent building block of deep learning models.
We propose the LieTransformer, an architecture composed of LieSelfAttention layers that are equivariant to arbitrary Lie groups and their discrete subgroups.
arXiv Detail & Related papers (2020-12-20T11:02:49Z) - Stability of Algebraic Neural Networks to Small Perturbations [179.55535781816343]
Algebraic neural networks (AlgNNs) are composed of a cascade of layers each one associated to and algebraic signal model.
We show how any architecture that uses a formal notion of convolution can be stable beyond particular choices of the shift operator.
arXiv Detail & Related papers (2020-10-22T09:10:16Z) - Algebraic Neural Networks: Stability to Deformations [179.55535781816343]
We study algebraic neural networks (AlgNNs) with commutative algebras.
AlgNNs unify diverse architectures such as Euclidean convolutional neural networks, graph neural networks, and group neural networks.
arXiv Detail & Related papers (2020-09-03T03:41:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.