Machine learning modularity
- URL: http://arxiv.org/abs/2601.01779v1
- Date: Mon, 05 Jan 2026 04:17:55 GMT
- Title: Machine learning modularity
- Authors: Yi Fan, Vishnu Jejjala, Yang Lei,
- Abstract summary: This work introduces a machine learning framework for automatically simplifying complex expressions involving multiple elliptic Gamma functions.<n>The model learns to apply algebraic identities, particularly the SL$(2,mathbbZ)$ and SL$(2,mathbbZ)$ modular transformations, to reduce heavily scrambled expressions to their canonical forms.
- Score: 6.316250090403085
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Based on a transformer based sequence-to-sequence architecture combined with a dynamic batching algorithm, this work introduces a machine learning framework for automatically simplifying complex expressions involving multiple elliptic Gamma functions, including the $q$-$θ$ function and the elliptic Gamma function. The model learns to apply algebraic identities, particularly the SL$(2,\mathbb{Z})$ and SL$(3,\mathbb{Z})$ modular transformations, to reduce heavily scrambled expressions to their canonical forms. Experimental results show that the model achieves over 99\% accuracy on in-distribution tests and maintains robust performance (exceeding 90\% accuracy) under significant extrapolation, such as with deeper scrambling depths. This demonstrates that the model has internalized the underlying algebraic rules of modular transformations rather than merely memorizing training patterns. Our work presents the first successful application of machine learning to perform symbolic simplification using modular identities, offering a new automated tool for computations with special functions in quantum field theory and the string theory.
Related papers
- Block encoding of sparse matrices with a periodic diagonal structure [67.45502291821956]
We provide an explicit quantum circuit for block encoding a sparse matrix with a periodic diagonal structure.<n>Various applications for the presented methodology are discussed in the context of solving differential problems.
arXiv Detail & Related papers (2026-02-11T07:24:33Z) - Learning Pseudorandom Numbers with Transformers: Permuted Congruential Generators, Curricula, and Interpretability [10.75037955193936]
We study the ability of Transformer models to learn sequences generated by Permuted Congruential Generators (PCGs)<n>PCGs introduce substantial additional difficulty over linear congruential generators (LCGs) by applying a series of bit-wise shifts, XORs, rotations and truncations to the hidden state.<n>We show that Transformers can nevertheless successfully perform in-context prediction on unseen sequences from diverse PCG variants.
arXiv Detail & Related papers (2025-10-30T17:59:09Z) - Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls [54.57326125204404]
Language models are increasingly capable, yet still fail at a seemingly simple task of multi-digit multiplication.<n>We study why, by reverse-engineering a model that successfully learns multiplication via emphimplicit chain-of-thought'
arXiv Detail & Related papers (2025-09-30T19:03:26Z) - Beyond Softmax: A Natural Parameterization for Categorical Random Variables [61.709831225296305]
We introduce the $textitcatnat$ function, a function composed of a sequence of hierarchical binary splits.<n>A rich set of experiments show that the proposed function improves the learning efficiency and yields models characterized by consistently higher test performance.
arXiv Detail & Related papers (2025-09-29T12:55:50Z) - Learning Modular Exponentiation with Transformers [0.38887448816036313]
We train a 4-layer encoder-decoder Transformer model to perform modular exponentiation.<n>We find that reciprocal training leads to strong performance gains, with sudden generalization across related moduli.<n>These results suggest that transformer models learn modular arithmetic through specialized computational circuits.
arXiv Detail & Related papers (2025-06-30T10:00:44Z) - Learning Compositional Functions with Transformers from Easy-to-Hard Data [63.96562216704653]
We study the learnability of the $k$-fold composition task, which requires computing an interleaved composition of $k$ input permutations and $k$ hidden permutations.<n>We show that this function class can be efficiently learned, with runtime and sample in $k$, by gradient descent on an $O(log k)$-depth transformer.
arXiv Detail & Related papers (2025-05-29T17:22:00Z) - Machine learning automorphic forms for black holes [0.0]
We show that machine learning can accurately predict modular weights from truncated expansions.<n>This study establishes a proof of concept for using machine learning to identify how data is organized in terms of modular symmetries in gravitational systems.
arXiv Detail & Related papers (2025-05-08T18:00:00Z) - A Hybrid Transformer Architecture with a Quantized Self-Attention Mechanism Applied to Molecular Generation [0.0]
We propose a hybrid quantum-classical self-attention mechanism as part of a transformer decoder.<n>We show that the time complexity of the query-key dot product is reduced from $mathcalO(n2 d)$ in a classical model to $mathcalO(n2 d)$ in our quantum model.<n>This work provides a promising avenue for quantum-enhanced natural language processing (NLP)
arXiv Detail & Related papers (2025-02-26T15:15:01Z) - Towards Empirical Interpretation of Internal Circuits and Properties in Grokked Transformers on Modular Polynomials [29.09237503747052]
Grokking on modular addition has been known to implement Fourier representation and its calculation circuits with trigonometric identities in Transformers.<n>We show that the transferability among the models grokked with each operation can be only limited to specific combinations.<n>Some multi-task mixtures may lead to co-grokking, where grokking simultaneously happens for all the tasks.
arXiv Detail & Related papers (2024-02-26T16:48:12Z) - Transformers as Statisticians: Provable In-Context Learning with
In-Context Algorithm Selection [88.23337313766353]
This work first provides a comprehensive statistical theory for transformers to perform ICL.
We show that transformers can implement a broad class of standard machine learning algorithms in context.
A emphsingle transformer can adaptively select different base ICL algorithms.
arXiv Detail & Related papers (2023-06-07T17:59:31Z) - Transformers Learn Shortcuts to Automata [52.015990420075944]
We find that a low-depth Transformer can represent the computations of any finite-state automaton.
We show that a Transformer with $O(log T)$ layers can exactly replicate the computation of an automaton on an input sequence of length $T$.
We further investigate the brittleness of these solutions and propose potential mitigations.
arXiv Detail & Related papers (2022-10-19T17:45:48Z) - Statistically Meaningful Approximation: a Case Study on Approximating
Turing Machines with Transformers [50.85524803885483]
This work proposes a formal definition of statistically meaningful (SM) approximation which requires the approximating network to exhibit good statistical learnability.
We study SM approximation for two function classes: circuits and Turing machines.
arXiv Detail & Related papers (2021-07-28T04:28:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.