Pretrained Language Models are Symbolic Mathematics Solvers too!
- URL: http://arxiv.org/abs/2110.03501v1
- Date: Thu, 7 Oct 2021 14:37:06 GMT
- Title: Pretrained Language Models are Symbolic Mathematics Solvers too!
- Authors: Kimia Noorbakhsh, Modar Sulaiman, Mahdi Sharifi, Kallol Roy, Pooyan
Jamshidi
- Abstract summary: Large-scale language models such as transformers are universal and surprisingly can be trained as a sequence-to-sequence task to solve complex equations.
We present a sample efficient way of solving the symbolic tasks by first pretraining the transformer model with language translation and then fine-tuning the pretrained transformer model to solve the downstream task of symbolic mathematics.
- Score: 1.9240537487954366
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Solving symbolic mathematics has always been of in the arena of human
ingenuity that needs compositional reasoning and recurrence. However, recent
studies have shown that large-scale language models such as transformers are
universal and surprisingly can be trained as a sequence-to-sequence task to
solve complex mathematical equations. These large transformer models need
humongous amounts of training data to generalize to unseen symbolic mathematics
problems. In this paper, we present a sample efficient way of solving the
symbolic tasks by first pretraining the transformer model with language
translation and then fine-tuning the pretrained transformer model to solve the
downstream task of symbolic mathematics. We achieve comparable accuracy on the
integration task with our pretrained model while using around $1.5$ orders of
magnitude less number of training samples with respect to the state-of-the-art
deep learning for symbolic mathematics. The test accuracy on differential
equation tasks is considerably lower comparing with integration as they need
higher order recursions that are not present in language translations. We
pretrain our model with different pairs of language translations. Our results
show language bias in solving symbolic mathematics tasks. Finally, we study the
robustness of the fine-tuned model on symbolic math tasks against distribution
shift, and our approach generalizes better in distribution shift scenarios for
the function integration.
Related papers
- Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models [12.424072830053445]
We present a model merging methodology that addresses the difficulty of fine-tuning Large Language Models (LLMs) for target tasks in non-English languages.
We fine-tune separate "experts" on math instruction data in English and on generic instruction data in the target language.
We replace the top and bottom transformer layers of the math expert directly with layers from the language expert, which consequently enhances math performance in the target language.
arXiv Detail & Related papers (2024-10-02T08:53:07Z) - A Hybrid System for Systematic Generalization in Simple Arithmetic
Problems [70.91780996370326]
We propose a hybrid system capable of solving arithmetic problems that require compositional and systematic reasoning over sequences of symbols.
We show that the proposed system can accurately solve nested arithmetical expressions even when trained only on a subset including the simplest cases.
arXiv Detail & Related papers (2023-06-29T18:35:41Z) - Learning to Reason With Relational Abstractions [65.89553417442049]
We study how to build stronger reasoning capability in language models using the idea of relational abstractions.
We find that models that are supplied with such sequences as prompts can solve tasks with a significantly higher accuracy.
arXiv Detail & Related papers (2022-10-06T00:27:50Z) - Heterogeneous Line Graph Transformer for Math Word Problems [21.4761673982334]
This paper describes the design and implementation of a new machine learning model for online learning systems.
We aim at improving the intelligent level of the systems by enabling an automated math word problem solver.
arXiv Detail & Related papers (2022-08-11T05:27:05Z) - JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem
Understanding [74.12405417718054]
This paper aims to advance the mathematical intelligence of machines by presenting the first Chinese mathematical pre-trained language model(PLM)
Unlike other standard NLP tasks, mathematical texts are difficult to understand, since they involve mathematical terminology, symbols and formulas in the problem statement.
We design a novel curriculum pre-training approach for improving the learning of mathematical PLMs, consisting of both basic and advanced courses.
arXiv Detail & Related papers (2022-06-13T17:03:52Z) - Tackling Math Word Problems with Fine-to-Coarse Abstracting and
Reasoning [22.127301797950572]
We propose to model a math word problem in a fine-to-coarse manner to capture both the local fine-grained information and the global logical structure of it.
Our model is naturally sensitive to local variations and can better generalize to unseen problem types.
arXiv Detail & Related papers (2022-05-17T12:14:44Z) - Recognizing and Verifying Mathematical Equations using Multiplicative
Differential Neural Units [86.9207811656179]
We show that memory-augmented neural networks (NNs) can achieve higher-order, memory-augmented extrapolation, stable performance, and faster convergence.
Our models achieve a 1.53% average improvement over current state-of-the-art methods in equation verification and achieve a 2.22% Top-1 average accuracy and 2.96% Top-5 average accuracy for equation completion.
arXiv Detail & Related papers (2021-04-07T03:50:11Z) - Measuring Mathematical Problem Solving With the MATH Dataset [55.4376028963537]
We introduce MATH, a dataset of 12,500 challenging competition mathematics problems.
Each problem has a full step-by-step solution which can be used to teach models to generate answer derivations and explanations.
We also contribute a large auxiliary pretraining dataset which helps teach models the fundamentals of mathematics.
arXiv Detail & Related papers (2021-03-05T18:59:39Z) - SMART: A Situation Model for Algebra Story Problems via Attributed
Grammar [74.1315776256292]
We introduce the concept of a emphsituation model, which originates from psychology studies to represent the mental states of humans in problem-solving.
We show that the proposed model outperforms all previous neural solvers by a large margin while preserving much better interpretability.
arXiv Detail & Related papers (2020-12-27T21:03:40Z) - A Mathematical Exploration of Why Language Models Help Solve Downstream
Tasks [35.046596668631615]
Autoregressive language models, pretrained using large text corpora to do well on next word prediction, have been successful at solving many downstream tasks.
This paper initiates a mathematical study of this phenomenon for the downstream task of text classification.
arXiv Detail & Related papers (2020-10-07T20:56:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.