HATSolver: Learning Groebner Bases with Hierarchical Attention Transformers
- URL: http://arxiv.org/abs/2512.14722v1
- Date: Tue, 09 Dec 2025 11:34:28 GMT
- Title: HATSolver: Learning Groebner Bases with Hierarchical Attention Transformers
- Authors: Mohamed Malhou, Ludovic Perret, Kristin Lauter,
- Abstract summary: At NeurIPS 2024, Kera et al. introduced the use of transformers for computing Groebner bases.<n>In this paper, we improve this approach by applying Hierarchical Attention Transformers (HATs) to solve systems of equations via Groebner bases.
- Score: 0.9722250595763385
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: At NeurIPS 2024, Kera et al. introduced the use of transformers for computing Groebner bases, a central object in computer algebra with numerous practical applications. In this paper, we improve this approach by applying Hierarchical Attention Transformers (HATs) to solve systems of multivariate polynomial equations via Groebner bases computation. The HAT architecture incorporates a tree-structured inductive bias that enables the modeling of hierarchical relationships present in the data and thus achieves significant computational savings compared to conventional flat attention models. We generalize to arbitrary depths and include a detailed computational cost analysis. Combined with curriculum learning, our method solves instances that are much larger than those in Kera et al. (2024 Learning to compute Groebner bases)
Related papers
- DaCe AD: Unifying High-Performance Automatic Differentiation for Machine Learning and Scientific Computing [54.73410106410609]
This work presents DaCe AD, a general, efficient automatic differentiation engine that requires no code modifications.<n>DaCe AD uses a novel ILP-based algorithm to optimize the trade-off between storing and recomputing to achieve maximum performance within a given memory constraint.
arXiv Detail & Related papers (2025-09-02T11:09:45Z) - Discovering Hidden Algebraic Structures via Transformers with Rank-Aware Beam GRPO [0.7885422274206872]
We develop a synthetic data generation pipeline providing fine-grained control over problem complexity.<n>Second, we train transformer models via supervised learning and evaluate them across four key dimensions involving scaling behavior and generalizability.<n>Third, we propose Beam Grouped Relative Policy (BGRPO), a rank-aware reinforcement learning method suitable for hard algebraic problems.
arXiv Detail & Related papers (2025-08-21T17:58:50Z) - The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm [46.167267094420644]
We show that GPTQ is mathematically identical to Babai's nearest plane algorithm for the classical closest vector problem.<n>We design post-training quantization methods that avoid clipping, and outperform the original GPTQ.
arXiv Detail & Related papers (2025-07-24T16:22:18Z) - Geometric Generality of Transformer-Based Gröbner Basis Computation [0.0]
In this paper, we address the computation of Gr"obner basis using Transformers.<n>We prove that datasets generated by the previously proposed algorithm are sufficiently general.<n>We also propose an extended and generalized algorithm to systematically construct datasets of ideal generators.
arXiv Detail & Related papers (2025-04-16T20:01:00Z) - Learning to Compute Gröbner Bases [3.8214695776749013]
This paper is the first to address the learning of Gr"obner basis with Transformers.
The training requires many pairs of a system and the associated Gr"obner basis.
We propose a hybrid input to handle tokens with continuity bias and avoid the growth of the vocabulary set.
arXiv Detail & Related papers (2023-11-21T11:54:21Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - CORE: Common Random Reconstruction for Distributed Optimization with
Provable Low Communication Complexity [110.50364486645852]
Communication complexity has become a major bottleneck for speeding up training and scaling up machine numbers.
We propose Common Om REOm, which can be used to compress information transmitted between machines.
arXiv Detail & Related papers (2023-09-23T08:45:27Z) - A Recursively Recurrent Neural Network (R2N2) Architecture for Learning
Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms.
We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z) - Characterizing Intrinsic Compositionality in Transformers with Tree
Projections [72.45375959893218]
neural models like transformers can route information arbitrarily between different parts of their input.
We show that transformers for three different tasks become more treelike over the course of training.
These trees are predictive of model behavior, with more tree-like models generalizing better on tests of compositional generalization.
arXiv Detail & Related papers (2022-11-02T17:10:07Z) - Multiparameter Persistent Homology-Generic Structures and Quantum
Computing [0.0]
This article is an application of commutative algebra to the study of persistent homology in topological data analysis.
The generic structure of such resolutions and the classifying spaces are studied using results spanning several decades of research.
arXiv Detail & Related papers (2022-10-20T17:30:20Z) - On Algebraic Constructions of Neural Networks with Small Weights [21.915057426589748]
We study the trade-offs among the weight sizes, circuit size and depth of neural gates.
Specifically, given a single linear equation with arbitrary coefficients, we would like to express it using a system of linear equations with smaller (even constant) coefficients.
We explicitly construct a constant weight, optimal size matrix to compute the EQUALITY function.
We prove the existence of the best-known weight size (linear) matrices to compute the COMPARISON function.
arXiv Detail & Related papers (2022-05-17T00:09:23Z) - Generalized Matrix Factorization: efficient algorithms for fitting
generalized linear latent variable models to large data arrays [62.997667081978825]
Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses.
Current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets.
We propose a new approach for fitting GLLVMs to high-dimensional datasets, based on approximating the model using penalized quasi-likelihood.
arXiv Detail & Related papers (2020-10-06T04:28:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.