Related papers: Reverse That Number! Decoding Order Matters in Arithmetic Learning

Reverse That Number! Decoding Order Matters in Arithmetic Learning

URL: http://arxiv.org/abs/2403.05845v1
Date: Sat, 9 Mar 2024 09:04:53 GMT
Title: Reverse That Number! Decoding Order Matters in Arithmetic Learning
Authors: Daniel Zhang-Li, Nianyi Lin, Jifan Yu, Zheyuan Zhang, Zijun Yao, Xiaokang Zhang, Lei Hou, Jing Zhang, Juanzi Li
Abstract summary: Our work introduces a novel strategy that reevaluates the digit order by prioritizing output from the least significant digit. Compared to the previous state-of-the-art (SOTA) method, our findings reveal an overall improvement of in accuracy while requiring only a third of the tokens typically used during training.
Score: 49.5504492920404
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recent advancements in pretraining have demonstrated that modern Large Language Models (LLMs) possess the capability to effectively learn arithmetic operations. However, despite acknowledging the significance of digit order in arithmetic computation, current methodologies predominantly rely on sequential, step-by-step approaches for teaching LLMs arithmetic, resulting in a conclusion where obtaining better performance involves fine-grained step-by-step. Diverging from this conventional path, our work introduces a novel strategy that not only reevaluates the digit order by prioritizing output from the least significant digit but also incorporates a step-by-step methodology to substantially reduce complexity. We have developed and applied this method in a comprehensive set of experiments. Compared to the previous state-of-the-art (SOTA) method, our findings reveal an overall improvement of in accuracy while requiring only a third of the tokens typically used during training. For the purpose of facilitating replication and further research, we have made our code and dataset publicly available at \url{https://anonymous.4open.science/r/RAIT-9FB7/}.

Related papers

Achieving More with Less: Additive Prompt Tuning for Rehearsal-Free Class-Incremental Learning [76.32953653161417]
Class-incremental learning enables models to learn new classes progressively while preserving knowledge of previously learned ones. Recent advances in this field have shifted towards parameter-efficient fine-tuning techniques. We present a novel prompt-based approach that addresses the limitation of current approaches.
arXiv Detail & Related papers (2025-03-11T02:27:37Z)
Task Arithmetic Through The Lens Of One-Shot Federated Learning [3.8230727103887943]
Task Arithmetic is a model merging technique that enables the combination of multiple models' capabilities into a single model. We show that Task Arithmetic is mathematically equivalent to the commonly used algorithm in Federated Learning. We adapt several algorithms from Federated Learning to improve the effectiveness of Task Arithmetic.
arXiv Detail & Related papers (2024-11-27T18:53:41Z)
Improve Mathematical Reasoning in Language Models by Automated Process Supervision [22.72856086318912]
We propose a novel Monte Carlo Tree Search (MCTS) algorithm named textitOmegaPRM for the efficient collection of high-quality process supervision data. We are able to collect over 1.5 million process supervision annotations to train a Process Reward Model (PRM) We have enhanced the instruction tuned Gemini Pro model's math reasoning performance, achieving a 69.4% success rate on the MATH benchmark.
arXiv Detail & Related papers (2024-06-05T19:25:40Z)
RevOrder: A Novel Method for Enhanced Arithmetic in Language Models [0.9043578619916238]
RevOrder reverses the output digits in addition, subtraction, and n-digit by 1-digit (nD by 1D) multiplication tasks. Our method significantly reduces the Count of Sequential Intermediate Digits (CSID) to $mathcalO(1)$.
arXiv Detail & Related papers (2024-02-06T09:10:35Z)
Teaching Arithmetic to Small Transformers [39.72665384986095]
This study investigates how small transformers can efficiently learn arithmetic operations. We first demonstrate that conventional training data is not the most effective for arithmetic learning. We then train on chain-of-thought style data that includes intermediate step results.
arXiv Detail & Related papers (2023-07-07T04:33:31Z)
Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning [75.74103236299477]
Chain-of-thought prompting(CoT) and tool augmentation have been validated as effective practices for improving large language models. We propose a new approach that can deliberate the reasoning steps with tool interfaces, namely textbfDELI. Experimental results on CARP and six other datasets show that the proposed DELI mostly outperforms competitive baselines.
arXiv Detail & Related papers (2023-06-04T17:02:59Z)
Arithmetic-Based Pretraining -- Improving Numeracy of Pretrained Language Models [67.48894919842576]
State-of-the-art pretrained language models tend to perform below their capabilities when applied out-of-the-box on tasks that require numeracy. We propose a new extended pretraining approach called Arithmetic-Based Pretraining that jointly addresses both in one extended pretraining step. Our experiments show the effectiveness of Arithmetic-Based Pretraining in three different tasks that require improved numeracy.
arXiv Detail & Related papers (2022-05-13T16:10:13Z)
Few-shot Sequence Learning with Transformers [79.87875859408955]
Few-shot algorithms aim at learning new tasks provided only a handful of training examples. In this work we investigate few-shot learning in the setting where the data points are sequences of tokens. We propose an efficient learning algorithm based on Transformers.
arXiv Detail & Related papers (2020-12-17T12:30:38Z)
Second-order Neural Network Training Using Complex-step Directional Derivative [41.4333906662624]
We introduce a numerical algorithm for second-order neural network training. We tackle the practical obstacle of Hessian calculation by using the complex-step finite difference. We believe our method will inspire a wide-range of new algorithms for deep learning and numerical optimization.
arXiv Detail & Related papers (2020-09-15T13:46:57Z)
Process Discovery for Structured Program Synthesis [70.29027202357385]
A core task in process mining is process discovery which aims to learn an accurate process model from event log data. In this paper, we propose to use (block-) structured programs directly as target process models. We develop a novel bottom-up agglomerative approach to the discovery of such structured program process models.
arXiv Detail & Related papers (2020-08-13T10:33:10Z)
Continual Deep Learning by Functional Regularisation of Memorable Past [95.97578574330934]
Continually learning new skills is important for intelligent systems, yet standard deep learning methods suffer from catastrophic forgetting of the past. We propose a new functional-regularisation approach that utilises a few memorable past examples crucial to avoid forgetting. Our method achieves state-of-the-art performance on standard benchmarks and opens a new direction for life-long learning where regularisation and memory-based methods are naturally combined.
arXiv Detail & Related papers (2020-04-29T10:47:54Z)
Stacked Generalizations in Imbalanced Fraud Data Sets using Resampling Methods [2.741266294612776]
This study uses stacked generalization, which is a two-step process of combining machine learning methods, called meta or super learners, for improving the performance of algorithms. Building a test harness that accounts for all permutations of algorithm sample set pairs demonstrates that the complex, intrinsic data structures are all thoroughly tested.
arXiv Detail & Related papers (2020-04-03T20:38:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.