Small Language Models are Equation Reasoners
- URL: http://arxiv.org/abs/2409.12393v1
- Date: Thu, 19 Sep 2024 01:34:43 GMT
- Title: Small Language Models are Equation Reasoners
- Authors: Bumjun Kim, Kunha Lee, Juyeon Kim, Sangam Lee,
- Abstract summary: Chain-of-Thought (CoT) reasoning has enabled Large Language Model (LLM) to achieve remarkable performance in various NLP tasks, including arithmetic problem-solving.
However, this success does not generalize to small language model (sLM) like T5, due to their limited capacity and absence of emergent abilities associated with larger models.
Recent works to enhance sLM through knowledge distillation have yielded some improvements but still face significant limitations.
- Score: 2.424910201171407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chain-of-Thought (CoT) reasoning has enabled Large Language Model (LLM) to achieve remarkable performance in various NLP tasks, including arithmetic problem-solving. However, this success does not generalize to small language model (sLM) like T5, due to their limited capacity and absence of emergent abilities associated with larger models. Recent works to enhance sLM through knowledge distillation have yielded some improvements but still face significant limitations, particularly high ambiguity from the variability in natural language expressions and substantial computational costs. In this paper, we investigate why sLM perform poorly on arithmetic reasoning tasks and hypothesize that natural language format variability introduces high ambiguity for these smaller models. Based on this hypothesis, we conduct experiments with equation-only format, which is a reasoning format that unifies arithmetic reasoning previously expressed in natural language formats into mathematical equations. Experiment results demonstrate that equation-only format effectively boosts the arithmetic reasoning abilities of sLM, especially in very small models like T5-Tiny.
Related papers
- Large Language Models and Mathematical Reasoning Failures [1.6114012813668932]
This paper investigates the mathematical reasoning capabilities of large language models (LLMs) using 50 newly constructed high-school-level word problems.
We rigorously analyze both final answers and solution steps to identify reasoning failures.
We find that while newer models (e.g., o3-mini, deepseek-r1) achieve higher accuracy, all models exhibit errors in spatial reasoning, strategic planning, and arithmetic.
arXiv Detail & Related papers (2025-02-17T09:07:32Z) - How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs [69.55103380185612]
We identify numerical precision as a key factor that influences Transformer-based Large Language Models' effectiveness in mathematical tasks.
Our results show that Transformers operating with low numerical precision fail to address arithmetic tasks, such as iterated addition and integer multiplication.
In contrast, Transformers with standard numerical precision can efficiently handle these tasks with significantly smaller model sizes.
arXiv Detail & Related papers (2024-10-17T17:59:35Z) - Key-Point-Driven Mathematical Reasoning Distillation of Large Language Model [15.542737858152053]
We propose Key-Point-Driven Mathematical Reasoning Distillation (KPDD) to mitigate misunderstanding errors.
KPDD enhances the reasoning performance of SLMs by breaking down the problem-solving process into three stages.
Experiments show KPDD-CoT significantly improves reasoning abilities, while KPDD-PoT achieves state-of-the-art performance in mathematical reasoning tasks.
arXiv Detail & Related papers (2024-07-14T11:41:03Z) - LLMs for Mathematical Modeling: Towards Bridging the Gap between Natural and Mathematical Languages [14.04286044600141]
Large Language Models (LLMs) have demonstrated strong performance across various natural language processing tasks.
But their proficiency in mathematical reasoning remains a key challenge.
We propose a process-oriented framework to evaluate LLMs' ability to construct mathematical models.
arXiv Detail & Related papers (2024-05-21T18:29:54Z) - Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models [100.53662473219806]
Diffusion-of-Thought (DoT) is a novel approach that integrates diffusion models with Chain-of-Thought.
DoT allows reasoning steps to diffuse over time through a diffusion language model.
Our results demonstrate the effectiveness of DoT in multi-digit multiplication, logic, and grade school math problems.
arXiv Detail & Related papers (2024-02-12T16:23:28Z) - Small Models are Valuable Plug-ins for Large Language Models [65.29370906766997]
Large language models (LLMs) such as GPT-3 and GPT-4 are powerful but their weights are often publicly unavailable.
We propose Super In-Context Learning (SuperICL) which allows black-box LLMs to work with locally fine-tuned smaller models.
arXiv Detail & Related papers (2023-05-15T17:59:01Z) - Reflection of Thought: Inversely Eliciting Numerical Reasoning in
Language Models via Solving Linear Systems [42.782260686177395]
We propose a novel method to elicit and exploit the numerical reasoning knowledge hidden in pre-trained language models.
We first leverage simple numbers as anchors to probe the implicitly inferred arithmetic expressions from language models.
We transform and formulate the task as an analytically solvable linear system.
arXiv Detail & Related papers (2022-10-11T00:57:19Z) - Limitations of Language Models in Arithmetic and Symbolic Induction [20.49118435604774]
Large pretrained Language Models (LMs) can perform remarkably well on a range of Natural Language Processing (NLP) tasks.
We find that these models have limitations on certain basic symbolic manipulation tasks such as copy, reverse, and addition.
We investigate the potential causes behind this phenomenon and examine a set of possible methods, including explicit positional markers, fine-grained computation steps, and LMs with callable programs.
arXiv Detail & Related papers (2022-08-09T21:47:01Z) - Chain of Thought Prompting Elicits Reasoning in Large Language Models [56.811278668446825]
This paper explores the ability of language models to generate a coherent chain of thought.
Experiments show that inducing a chain of thought via prompting can enable sufficiently large language models to better perform reasoning tasks.
arXiv Detail & Related papers (2022-01-28T02:33:07Z) - Efficient Large Scale Language Modeling with Mixtures of Experts [61.45159383372181]
Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional computation.
This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings.
arXiv Detail & Related papers (2021-12-20T17:05:11Z) - Masked Language Modeling and the Distributional Hypothesis: Order Word
Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines.
In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics.
Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.