Learning Multi-Step Reasoning by Solving Arithmetic Tasks
- URL: http://arxiv.org/abs/2306.01707v3
- Date: Wed, 7 Jun 2023 03:45:15 GMT
- Title: Learning Multi-Step Reasoning by Solving Arithmetic Tasks
- Authors: Tianduo Wang and Wei Lu
- Abstract summary: This work investigates how to incorporate relatively small Language Models with the capabilities of multi-step reasoning.
We propose to inject such abilities by continually pre-training LMs on a synthetic dataset MsAT.
Our experiments on four math word problem datasets show the effectiveness of the proposed method.
- Score: 6.398022050054328
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mathematical reasoning is regarded as a necessary ability for Language Models
(LMs). Recent works demonstrate large LMs' impressive performance in solving
math problems. The success is attributed to their Chain-of-Thought (CoT)
reasoning abilities, i.e., the ability to decompose complex questions into
step-by-step reasoning chains, but such ability seems only to emerge from
models with abundant parameters. This work investigates how to incorporate
relatively small LMs with the capabilities of multi-step reasoning. We propose
to inject such abilities by continually pre-training LMs on a synthetic dataset
MsAT which is composed of Multi-step Arithmetic Tasks. Our experiments on four
math word problem datasets show the effectiveness of the proposed method in
enhancing LMs' math reasoning abilities.
Related papers
- GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers [68.77382332826167]
Large language models (LLMs) have achieved impressive performance across various mathematical reasoning benchmarks.
One essential and frequently occurring evidence is that when the math questions are slightly changed, LLMs can behave incorrectly.
This motivates us to evaluate the robustness of LLMs' math reasoning capability by testing a wide range of question variations.
arXiv Detail & Related papers (2024-02-29T15:26:14Z) - MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning [2.9104279358536647]
We present MathSensei, a tool-augmented large language model for mathematical reasoning.
We study the complementary benefits of the tools - knowledge retriever (Bing Web Search), program generator + executor (Python), and symbolic equation solver (Wolfram-Alpha API)
arXiv Detail & Related papers (2024-02-27T05:50:35Z) - Evaluating LLMs' Mathematical Reasoning in Financial Document Question
Answering [53.56653281752486]
This study explores Large Language Models' mathematical reasoning on four financial question-answering datasets.
We focus on sensitivity to table complexity and performance variations with an increasing number of arithmetic reasoning steps.
We introduce a novel prompting technique tailored to semi-structured documents, matching or outperforming other baselines in performance.
arXiv Detail & Related papers (2024-02-17T05:10:18Z) - InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning [98.53491178426492]
We open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2.
We unify chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter in a unified seq2seq format.
Our pre-trained model achieves 30.3 on the MiniF2F test set without fine-tuning.
arXiv Detail & Related papers (2024-02-09T11:22:08Z) - Frugal LMs Trained to Invoke Symbolic Solvers Achieve
Parameter-Efficient Arithmetic Reasoning [36.8749786658624]
Large Language Models (LLM) exhibit zero-shot mathematical reasoning capacity as a behavior emergent with scale.
We show that small LMs can achieve reasonable arithmetic reasoning if arithmetic word problems are posed as a formalize-then-solve task.
arXiv Detail & Related papers (2023-12-09T13:20:49Z) - No Train Still Gain. Unleash Mathematical Reasoning of Large Language
Models with Monte Carlo Tree Search Guided by Energy Function [3.0299876288833345]
Large language models (LLMs) demonstrate impressive language understanding and contextual learning abilities.
LLMs often struggle to generate correct reasoning steps and answers despite having high probabilities for the solutions.
We propose a method that incorporates Monte Carlo Tree Search (MCTS) and a lightweight energy function to rank decision steps.
arXiv Detail & Related papers (2023-09-01T13:10:54Z) - Faith and Fate: Limits of Transformers on Compositionality [109.79516190693415]
We investigate the limits of transformer large language models across three representative compositional tasks.
These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer.
Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching.
arXiv Detail & Related papers (2023-05-29T23:24:14Z) - SatLM: Satisfiability-Aided Language Models Using Declarative Prompting [68.40726892904286]
We propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of large language models (LLMs)
We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer.
We evaluate SATLM on 8 different datasets and show that it consistently outperforms program-aided LMs in the imperative paradigm.
arXiv Detail & Related papers (2023-05-16T17:55:51Z) - Limitations of Language Models in Arithmetic and Symbolic Induction [20.49118435604774]
Large pretrained Language Models (LMs) can perform remarkably well on a range of Natural Language Processing (NLP) tasks.
We find that these models have limitations on certain basic symbolic manipulation tasks such as copy, reverse, and addition.
We investigate the potential causes behind this phenomenon and examine a set of possible methods, including explicit positional markers, fine-grained computation steps, and LMs with callable programs.
arXiv Detail & Related papers (2022-08-09T21:47:01Z) - oLMpics -- On what Language Model Pre-training Captures [84.60594612120173]
We propose eight reasoning tasks, which require operations such as comparison, conjunction, and composition.
A fundamental challenge is to understand whether the performance of a LM on a task should be attributed to the pre-trained representations or to the process of fine-tuning on the task data.
arXiv Detail & Related papers (2019-12-31T12:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.