Reverse That Number! Decoding Order Matters in Arithmetic Learning
- URL: http://arxiv.org/abs/2403.05845v1
- Date: Sat, 9 Mar 2024 09:04:53 GMT
- Title: Reverse That Number! Decoding Order Matters in Arithmetic Learning
- Authors: Daniel Zhang-Li, Nianyi Lin, Jifan Yu, Zheyuan Zhang, Zijun Yao,
Xiaokang Zhang, Lei Hou, Jing Zhang, Juanzi Li
- Abstract summary: Our work introduces a novel strategy that reevaluates the digit order by prioritizing output from the least significant digit.
Compared to the previous state-of-the-art (SOTA) method, our findings reveal an overall improvement of in accuracy while requiring only a third of the tokens typically used during training.
- Score: 49.5504492920404
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent advancements in pretraining have demonstrated that modern Large
Language Models (LLMs) possess the capability to effectively learn arithmetic
operations. However, despite acknowledging the significance of digit order in
arithmetic computation, current methodologies predominantly rely on sequential,
step-by-step approaches for teaching LLMs arithmetic, resulting in a conclusion
where obtaining better performance involves fine-grained step-by-step.
Diverging from this conventional path, our work introduces a novel strategy
that not only reevaluates the digit order by prioritizing output from the least
significant digit but also incorporates a step-by-step methodology to
substantially reduce complexity. We have developed and applied this method in a
comprehensive set of experiments. Compared to the previous state-of-the-art
(SOTA) method, our findings reveal an overall improvement of in accuracy while
requiring only a third of the tokens typically used during training. For the
purpose of facilitating replication and further research, we have made our code
and dataset publicly available at
\url{https://anonymous.4open.science/r/RAIT-9FB7/}.
Related papers
- Improve Mathematical Reasoning in Language Models by Automated Process Supervision [22.72856086318912]
We propose a novel Monte Carlo Tree Search (MCTS) algorithm named textitOmegaPRM for the efficient collection of high-quality process supervision data.
We are able to collect over 1.5 million process supervision annotations to train a Process Reward Model (PRM)
We have enhanced the instruction tuned Gemini Pro model's math reasoning performance, achieving a 69.4% success rate on the MATH benchmark.
arXiv Detail & Related papers (2024-06-05T19:25:40Z) - RevOrder: A Novel Method for Enhanced Arithmetic in Language Models [0.9043578619916238]
RevOrder reverses the output digits in addition, subtraction, and n-digit by 1-digit (nD by 1D) multiplication tasks.
Our method significantly reduces the Count of Sequential Intermediate Digits (CSID) to $mathcalO(1)$.
arXiv Detail & Related papers (2024-02-06T09:10:35Z) - Teaching Arithmetic to Small Transformers [39.72665384986095]
This study investigates how small transformers can efficiently learn arithmetic operations.
We first demonstrate that conventional training data is not the most effective for arithmetic learning.
We then train on chain-of-thought style data that includes intermediate step results.
arXiv Detail & Related papers (2023-07-07T04:33:31Z) - Evaluating and Improving Tool-Augmented Computation-Intensive Math
Reasoning [75.74103236299477]
Chain-of-thought prompting(CoT) and tool augmentation have been validated as effective practices for improving large language models.
We propose a new approach that can deliberate the reasoning steps with tool interfaces, namely textbfDELI.
Experimental results on CARP and six other datasets show that the proposed DELI mostly outperforms competitive baselines.
arXiv Detail & Related papers (2023-06-04T17:02:59Z) - Arithmetic-Based Pretraining -- Improving Numeracy of Pretrained
Language Models [67.48894919842576]
State-of-the-art pretrained language models tend to perform below their capabilities when applied out-of-the-box on tasks that require numeracy.
We propose a new extended pretraining approach called Arithmetic-Based Pretraining that jointly addresses both in one extended pretraining step.
Our experiments show the effectiveness of Arithmetic-Based Pretraining in three different tasks that require improved numeracy.
arXiv Detail & Related papers (2022-05-13T16:10:13Z) - Few-shot Sequence Learning with Transformers [79.87875859408955]
Few-shot algorithms aim at learning new tasks provided only a handful of training examples.
In this work we investigate few-shot learning in the setting where the data points are sequences of tokens.
We propose an efficient learning algorithm based on Transformers.
arXiv Detail & Related papers (2020-12-17T12:30:38Z) - Second-order Neural Network Training Using Complex-step Directional
Derivative [41.4333906662624]
We introduce a numerical algorithm for second-order neural network training.
We tackle the practical obstacle of Hessian calculation by using the complex-step finite difference.
We believe our method will inspire a wide-range of new algorithms for deep learning and numerical optimization.
arXiv Detail & Related papers (2020-09-15T13:46:57Z) - Process Discovery for Structured Program Synthesis [70.29027202357385]
A core task in process mining is process discovery which aims to learn an accurate process model from event log data.
In this paper, we propose to use (block-) structured programs directly as target process models.
We develop a novel bottom-up agglomerative approach to the discovery of such structured program process models.
arXiv Detail & Related papers (2020-08-13T10:33:10Z) - Continual Deep Learning by Functional Regularisation of Memorable Past [95.97578574330934]
Continually learning new skills is important for intelligent systems, yet standard deep learning methods suffer from catastrophic forgetting of the past.
We propose a new functional-regularisation approach that utilises a few memorable past examples crucial to avoid forgetting.
Our method achieves state-of-the-art performance on standard benchmarks and opens a new direction for life-long learning where regularisation and memory-based methods are naturally combined.
arXiv Detail & Related papers (2020-04-29T10:47:54Z) - Stacked Generalizations in Imbalanced Fraud Data Sets using Resampling
Methods [2.741266294612776]
This study uses stacked generalization, which is a two-step process of combining machine learning methods, called meta or super learners, for improving the performance of algorithms.
Building a test harness that accounts for all permutations of algorithm sample set pairs demonstrates that the complex, intrinsic data structures are all thoroughly tested.
arXiv Detail & Related papers (2020-04-03T20:38:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.