Related papers: Geometric Generality of Transformer-Based Gröbner Basis Computation

Geometric Generality of Transformer-Based Gröbner Basis Computation

URL: http://arxiv.org/abs/2504.12465v1
Date: Wed, 16 Apr 2025 20:01:00 GMT
Title: Geometric Generality of Transformer-Based Gröbner Basis Computation
Authors: Yuta Kambe, Yota Maeda, Tristan Vaccon,
Abstract summary: In this paper, we address the computation of Gr"obner basis using Transformers.<n>We prove that datasets generated by the previously proposed algorithm are sufficiently general.<n>We also propose an extended and generalized algorithm to systematically construct datasets of ideal generators.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The intersection of deep learning and symbolic mathematics has seen rapid progress in recent years, exemplified by the work of Lample and Charton. They demonstrated that effective training of machine learning models for solving mathematical problems critically depends on high-quality, domain-specific datasets. In this paper, we address the computation of Gr\"obner basis using Transformers. While a dataset generation method tailored to Transformer-based Gr\"obner basis computation has previously been proposed, it lacked theoretical guarantees regarding the generality or quality of the generated datasets. In this work, we prove that datasets generated by the previously proposed algorithm are sufficiently general, enabling one to ensure that Transformers can learn a sufficiently diverse range of Gr\"obner bases. Moreover, we propose an extended and generalized algorithm to systematically construct datasets of ideal generators, further enhancing the training effectiveness of Transformer. Our results provide a rigorous geometric foundation for Transformers to address a mathematical problem, which is an answer to Lample and Charton's idea of training on diverse or representative inputs.

Related papers

HATSolver: Learning Groebner Bases with Hierarchical Attention Transformers [0.9722250595763385]
At NeurIPS 2024, Kera et al. introduced the use of transformers for computing Groebner bases.<n>In this paper, we improve this approach by applying Hierarchical Attention Transformers (HATs) to solve systems of equations via Groebner bases.
arXiv Detail & Related papers (2025-12-09T11:34:28Z)
Discovering Hidden Algebraic Structures via Transformers with Rank-Aware Beam GRPO [0.7885422274206872]
We develop a synthetic data generation pipeline providing fine-grained control over problem complexity.<n>Second, we train transformer models via supervised learning and evaluate them across four key dimensions involving scaling behavior and generalizability.<n>Third, we propose Beam Grouped Relative Policy (BGRPO), a rank-aware reinforcement learning method suitable for hard algebraic problems.
arXiv Detail & Related papers (2025-08-21T17:58:50Z)
Graded Transformers: A Symbolic-Geometric Approach to Structured Learning [0.0]
We introduce a novel class of sequence models that embed inductive biases through grading transformations on vector spaces.<n>The Graded Transformer holds transformative potential for hierarchical learning and neurosymbolic reasoning.<n>This work advances structured deep learning by fusing geometric and algebraic principles with attention mechanisms.
arXiv Detail & Related papers (2025-07-27T02:34:08Z)
Universal Approximation Theorem for a Single-Layer Transformer [0.0]
Deep learning employs multi-layer neural networks trained via the backpropagation algorithm.<n>Transformers have achieved state-of-the-art performance in natural language processing.<n>We prove that a single-layer Transformer, comprising one self-attention layer followed by a position-wise feed-forward network with ReLU activation, can any continuous sequence-to-sequence mapping on a compact domain to arbitrary precision.
arXiv Detail & Related papers (2025-07-11T11:37:39Z)
Geometry-Informed Neural Operator Transformer [0.8906214436849201]
This work introduces the Geometry-Informed Neural Operator Transformer (GINOT), which integrates the transformer architecture with the neural operator framework to enable forward predictions on arbitrary geometries.<n>The performance of GINOT is validated on multiple challenging datasets, showcasing its high accuracy and strong generalization capabilities for complex and arbitrary 2D and 3D geometries.
arXiv Detail & Related papers (2025-04-28T03:39:27Z)
Graph Transformers Dream of Electric Flow [72.06286909236827]
We show that the linear Transformer, when applied to graph data, can implement algorithms that solve canonical problems.<n>We present explicit weight configurations for implementing each algorithm, and we bound the constructed Transformers' errors by the errors of the underlying algorithms.<n>Our work is an initial step towards elucidating the inner-workings of the Transformer for graph data.
arXiv Detail & Related papers (2024-10-22T05:11:45Z)
Learning Linear Attention in Polynomial Time [115.68795790532289]
We provide the first results on learnability of single-layer Transformers with linear attention. We show that linear attention may be viewed as a linear predictor in a suitably defined RKHS. We show how to efficiently identify training datasets for which every empirical riskr is equivalent to the linear Transformer.
arXiv Detail & Related papers (2024-10-14T02:41:01Z)
Algorithmic Capabilities of Random Transformers [49.73113518329544]
We investigate what functions can be learned by randomly transformers in which only the embedding layers are optimized. We find that these random transformers can perform a wide range of meaningful algorithmic tasks. Our results indicate that some algorithmic capabilities are present in transformers even before these models are trained.
arXiv Detail & Related papers (2024-10-06T06:04:23Z)
Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization [88.5582111768376]
We study the optimization of a Transformer composed of a self-attention layer with softmax followed by a fully connected layer under gradient descent on a certain data distribution model. Our results establish a sharp condition that can distinguish between the small test error phase and the large test error regime, based on the signal-to-noise ratio in the data model.
arXiv Detail & Related papers (2024-09-28T13:24:11Z)
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption. We analyze how magnitude-based models affect generalization while improving adaption. We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z)
AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures [80.28359222380733]
We design a novel transformer framework, dubbed AlgoFormer, to empower transformers with algorithmic capabilities.<n>In particular, inspired by the structure of human-designed learning algorithms, our transformer framework consists of a pre-transformer that is responsible for task preprocessing.<n>Some theoretical and empirical results are presented to show that the designed transformer has the potential to perform algorithm representation and learning.
arXiv Detail & Related papers (2024-02-21T07:07:54Z)
Looped Transformers are Better at Learning Learning Algorithms [16.98720552888865]
We propose the utilization of looped transformer architecture and its associated training methodology. Experimental results suggest that the looped transformer achieves performance comparable to the standard transformer.
arXiv Detail & Related papers (2023-11-21T08:32:38Z)
A Symbolic Framework for Evaluating Mathematical Reasoning and Generalisation with Transformers [17.075558137261986]
We evaluate the generalisability of Transformers to out-of-distribution mathematical reasoning problems. We compare the capabilities of GPT-4, GPT-3.5, and a canon of fine-tuned BERT models. Surprisingly, our evaluation reveals that the average in-distribution performance of fine-tuned models surpasses GPT-3.5, and rivals GPT-4.
arXiv Detail & Related papers (2023-05-21T20:40:37Z)
Transformer-based Planning for Symbolic Regression [18.90700817248397]
We propose TPSR, a Transformer-based Planning strategy for Symbolic Regression. Unlike conventional decoding strategies, TPSR enables the integration of non-differentiable feedback, such as fitting accuracy and complexity. Our approach outperforms state-of-the-art methods, enhancing the model's fitting-complexity trade-off, Symbolic abilities, and robustness to noise.
arXiv Detail & Related papers (2023-03-13T03:29:58Z)
Transformers learn in-context by gradient descent [58.24152335931036]
Training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations. We show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass.
arXiv Detail & Related papers (2022-12-15T09:21:21Z)
Linear algebra with transformers [0.0]
We show that transformers can be trained to perform numerical calculations with high accuracy. We consider problems of linear algebra: matrix transposition, addition, multiplication, eigenvalues and vectors, singular value decomposition, and inversion.
arXiv Detail & Related papers (2021-12-03T13:21:57Z)
THG: Transformer with Hyperbolic Geometry [8.895324519034057]
"X-former" models make changes only around the quadratic time and memory complexity of self-attention. We propose a novel Transformer with Hyperbolic Geometry (THG) model, which take the advantage of both Euclidean space and Hyperbolic space.
arXiv Detail & Related papers (2021-06-01T14:09:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.