Transformers to Predict the Applicability of Symbolic Integration Routines
- URL: http://arxiv.org/abs/2410.23948v1
- Date: Thu, 31 Oct 2024 14:03:37 GMT
- Title: Transformers to Predict the Applicability of Symbolic Integration Routines
- Authors: Rashid Barket, Uzma Shafiq, Matthew England, Juergen Gerhard,
- Abstract summary: We consider how machine learning may be used to optimise this task in a Computer System.
We train transformers that predict whether a particular integration method will be successful, and compare against the existing human-made Algebras.
We find the transformer can outperform these guards, gaining up to 30% accuracy and 70% precision.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Symbolic integration is a fundamental problem in mathematics: we consider how machine learning may be used to optimise this task in a Computer Algebra System (CAS). We train transformers that predict whether a particular integration method will be successful, and compare against the existing human-made heuristics (called guards) that perform this task in a leading CAS. We find the transformer can outperform these guards, gaining up to 30% accuracy and 70% precision. We further show that the inference time of the transformer is inconsequential which shows that it is well-suited to include as a guard in a CAS. Furthermore, we use Layer Integrated Gradients to interpret the decisions that the transformer is making. If guided by a subject-matter expert, the technique can explain some of the predictions based on the input tokens, which can lead to further optimisations.
Related papers
- Transformers Don't In-Context Learn Least Squares Regression [5.648229654902264]
In-context learning (ICL) has emerged as a powerful capability of large pretrained transformers.<n>We study how transformers implement learning at inference time.<n>We highlight the role of the pretraining corpus in shaping ICL behaviour.
arXiv Detail & Related papers (2025-07-13T01:09:26Z) - SAT Strikes Back: Parameter and Path Relations in Quantum Toolchains [3.0189109720302207]
It is crucial to find (multiple) transformation paths that are optimised for ( hardware specific) metrics.<n>We zoom into this pictured tree of transformations by focussing on k-SAT instances as input and their transformation to QUBO.<n>Our results can be used to rate valid paths of transformation in advance -- also in automated (quantum) toolchains.
arXiv Detail & Related papers (2025-05-28T07:32:37Z) - Born a Transformer -- Always a Transformer? [57.37263095476691]
We study a family of $textitretrieval$ and $textitcopying$ tasks inspired by Liu et al.<n>We observe an $textitinduction-versus-anti-induction$ asymmetry, where pretrained models are better at retrieving tokens to the right (induction) than the left (anti-induction) of a query token.<n>Mechanistic analysis reveals that this asymmetry is connected to the differences in the strength of induction versus anti-induction circuits within pretrained transformers.
arXiv Detail & Related papers (2025-05-27T21:36:50Z) - Continuum Transformers Perform In-Context Learning by Operator Gradient Descent [18.928543069018865]
We show that continuum transformers can perform in-context operator learning by performing gradient descent in an operator RKHS.<n>We provide empirical validations of this optimality result and demonstrate that the parameters under which such gradient descent is performed are recovered through the continuum transformer training.
arXiv Detail & Related papers (2025-05-23T12:52:54Z) - One-Layer Transformer Provably Learns One-Nearest Neighbor In Context [48.4979348643494]
We study the capability of one-layer transformers learning the one-nearest neighbor rule.
A single softmax attention layer can successfully learn to behave like a one-nearest neighbor.
arXiv Detail & Related papers (2024-11-16T16:12:42Z) - Interpreting Affine Recurrence Learning in GPT-style Transformers [54.01174470722201]
In-context learning allows GPT-style transformers to generalize during inference without modifying their weights.
This paper focuses specifically on their ability to learn and predict affine recurrences as an ICL task.
We analyze the model's internal operations using both empirical and theoretical approaches.
arXiv Detail & Related papers (2024-10-22T21:30:01Z) - On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent [51.50999191584981]
Sign Gradient Descent (SignGD) serves as an effective surrogate for Adam.
We study how SignGD optimize a two-layer transformer on a noisy dataset.
We find that the poor generalization of SignGD is not solely due to data noise, suggesting that both SignGD and Adam requires high-quality data for real-world tasks.
arXiv Detail & Related papers (2024-10-07T09:36:43Z) - Are Transformers in Pre-trained LM A Good ASR Encoder? An Empirical Study [52.91899050612153]
transformers within pre-trained language models (PLMs) when repurposed as encoders for Automatic Speech Recognition (ASR)
Our findings reveal a notable improvement in Character Error Rate (CER) and Word Error Rate (WER) across diverse ASR tasks when transformers from pre-trained LMs are incorporated.
This underscores the potential of leveraging the semantic prowess embedded within pre-trained transformers to advance ASR systems' capabilities.
arXiv Detail & Related papers (2024-09-26T11:31:18Z) - Dissecting Multiplication in Transformers: Insights into LLMs [23.109124772063574]
We focus on a typical arithmetic task, integer multiplication, to explore and explain the imperfection of transformers in this domain.
We provide comprehensive analysis of a vanilla transformer trained to perform n-digit integer multiplication.
We propose improvements to enhance transformers performance on multiplication tasks.
arXiv Detail & Related papers (2024-07-22T04:07:26Z) - Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption.
We analyze how magnitude-based models affect generalization while improving adaption.
We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z) - Understanding the Training and Generalization of Pretrained Transformer for Sequential Decision Making [7.8816327398541635]
We consider the supervised pre-trained transformer for a class of sequential decision-making problems.
Such a structure enables the use of optimal actions/decisions in the pre-training phase.
arXiv Detail & Related papers (2024-05-23T06:28:44Z) - DoT: An efficient Double Transformer for NLP tasks with tables [3.0079490585515343]
DoT is a double transformer model that decomposes the problem into two sub-tasks.
We show that for a small drop of accuracy, DoT improves training and inference time by at least 50%.
arXiv Detail & Related papers (2021-06-01T13:33:53Z) - Finetuning Pretrained Transformers into RNNs [81.72974646901136]
Transformers have outperformed recurrent neural networks (RNNs) in natural language generation.
A linear-complexity recurrent variant has proven well suited for autoregressive generation.
This work aims to convert a pretrained transformer into its efficient recurrent counterpart.
arXiv Detail & Related papers (2021-03-24T10:50:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.