Related papers: Small Language Models are Equation Reasoners

Small Language Models are Equation Reasoners

URL: http://arxiv.org/abs/2409.12393v1
Date: Thu, 19 Sep 2024 01:34:43 GMT
Title: Small Language Models are Equation Reasoners
Authors: Bumjun Kim, Kunha Lee, Juyeon Kim, Sangam Lee,
Abstract summary: Chain-of-Thought (CoT) reasoning has enabled Large Language Model (LLM) to achieve remarkable performance in various NLP tasks, including arithmetic problem-solving. However, this success does not generalize to small language model (sLM) like T5, due to their limited capacity and absence of emergent abilities associated with larger models. Recent works to enhance sLM through knowledge distillation have yielded some improvements but still face significant limitations.
Score: 2.424910201171407
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Chain-of-Thought (CoT) reasoning has enabled Large Language Model (LLM) to achieve remarkable performance in various NLP tasks, including arithmetic problem-solving. However, this success does not generalize to small language model (sLM) like T5, due to their limited capacity and absence of emergent abilities associated with larger models. Recent works to enhance sLM through knowledge distillation have yielded some improvements but still face significant limitations, particularly high ambiguity from the variability in natural language expressions and substantial computational costs. In this paper, we investigate why sLM perform poorly on arithmetic reasoning tasks and hypothesize that natural language format variability introduces high ambiguity for these smaller models. Based on this hypothesis, we conduct experiments with equation-only format, which is a reasoning format that unifies arithmetic reasoning previously expressed in natural language formats into mathematical equations. Experiment results demonstrate that equation-only format effectively boosts the arithmetic reasoning abilities of sLM, especially in very small models like T5-Tiny.

Related papers

Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them [5.4908640334628345]
Large language models (LLMs) and Vision language models (VLMs) have been able to perform various forms of reasoning tasks. We propose a novel experimental approach, Misleading Fine-Tuning (MisFT), to examine whether LLMs/VLMs perform abstract reasoning.
arXiv Detail & Related papers (2025-03-20T17:54:42Z)
Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models [17.673293240849787]
We introduce SPHERE, a self-evolving data generation pipeline that enhances reasoning in small language models (SLMs) SPHERE operates in three stages: (i) Self-Generation, where the model autonomously constructs problem-solving steps; (ii) Self-Correction, enabling it to identify and rectify errors; and (iii) Diversity Induction, improving robustness through multiple valid reasoning trajectories. We show that SPHERE-trained models achieve significant gains over their base versions and match/surpass GPT-4o on certain benchmarks.
arXiv Detail & Related papers (2025-03-04T14:43:25Z)
DBR: Divergence-Based Regularization for Debiasing Natural Language Understanding Models [50.54264918467997]
Pre-trained language models (PLMs) have achieved impressive results on various natural language processing tasks. Recent research has revealed that these models often rely on superficial features and shortcuts instead of developing a genuine understanding of language. We propose Divergence Based Regularization (DBR) to mitigate this shortcut learning behavior.
arXiv Detail & Related papers (2025-02-25T16:44:10Z)
Large Language Models and Mathematical Reasoning Failures [1.6114012813668932]
This paper investigates the mathematical reasoning capabilities of large language models (LLMs) using 50 newly constructed high-school-level word problems. We rigorously analyze both final answers and solution steps to identify reasoning failures. We find that while newer models (e.g., o3-mini, deepseek-r1) achieve higher accuracy, all models exhibit errors in spatial reasoning, strategic planning, and arithmetic.
arXiv Detail & Related papers (2025-02-17T09:07:32Z)
How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs [69.55103380185612]
We identify numerical precision as a key factor that influences Transformer-based Large Language Models' effectiveness in mathematical tasks. Our results show that Transformers operating with low numerical precision fail to address arithmetic tasks, such as iterated addition and integer multiplication. In contrast, Transformers with standard numerical precision can efficiently handle these tasks with significantly smaller model sizes.
arXiv Detail & Related papers (2024-10-17T17:59:35Z)
Key-Point-Driven Mathematical Reasoning Distillation of Large Language Model [15.542737858152053]
We propose Key-Point-Driven Mathematical Reasoning Distillation (KPDD) to mitigate misunderstanding errors. KPDD enhances the reasoning performance of SLMs by breaking down the problem-solving process into three stages. Experiments show KPDD-CoT significantly improves reasoning abilities, while KPDD-PoT achieves state-of-the-art performance in mathematical reasoning tasks.
arXiv Detail & Related papers (2024-07-14T11:41:03Z)
LLMs for Mathematical Modeling: Towards Bridging the Gap between Natural and Mathematical Languages [14.04286044600141]
Large Language Models (LLMs) have demonstrated strong performance across various natural language processing tasks. But their proficiency in mathematical reasoning remains a key challenge. We propose a process-oriented framework to evaluate LLMs' ability to construct mathematical models.
arXiv Detail & Related papers (2024-05-21T18:29:54Z)
Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models [100.53662473219806]
Diffusion-of-Thought (DoT) is a novel approach that integrates diffusion models with Chain-of-Thought. DoT allows reasoning steps to diffuse over time through a diffusion language model. Our results demonstrate the effectiveness of DoT in multi-digit multiplication, logic, and grade school math problems.
arXiv Detail & Related papers (2024-02-12T16:23:28Z)
Generating Mathematical Derivations with Large Language Models [2.363388546004777]
We leverage a symbolic engine to generate derivations of equations at scale. We investigate the capabilities of Large Language Models when deriving goal equations from premises.
arXiv Detail & Related papers (2023-07-19T14:13:02Z)
Small Models are Valuable Plug-ins for Large Language Models [65.29370906766997]
Large language models (LLMs) such as GPT-3 and GPT-4 are powerful but their weights are often publicly unavailable. We propose Super In-Context Learning (SuperICL) which allows black-box LLMs to work with locally fine-tuned smaller models.
arXiv Detail & Related papers (2023-05-15T17:59:01Z)
Overcoming Barriers to Skill Injection in Language Modeling: Case Study in Arithmetic [14.618731441943847]
We develop a novel framework that enables language models to be mathematically proficient while retaining their linguistic prowess. Specifically, we offer information-theoretic interventions to overcome the catastrophic forgetting of linguistic skills that occurs while injecting non-linguistic skills into language models.
arXiv Detail & Related papers (2022-11-03T18:53:30Z)
Reflection of Thought: Inversely Eliciting Numerical Reasoning in Language Models via Solving Linear Systems [42.782260686177395]
We propose a novel method to elicit and exploit the numerical reasoning knowledge hidden in pre-trained language models. We first leverage simple numbers as anchors to probe the implicitly inferred arithmetic expressions from language models. We transform and formulate the task as an analytically solvable linear system.
arXiv Detail & Related papers (2022-10-11T00:57:19Z)
Limitations of Language Models in Arithmetic and Symbolic Induction [20.49118435604774]
Large pretrained Language Models (LMs) can perform remarkably well on a range of Natural Language Processing (NLP) tasks. We find that these models have limitations on certain basic symbolic manipulation tasks such as copy, reverse, and addition. We investigate the potential causes behind this phenomenon and examine a set of possible methods, including explicit positional markers, fine-grained computation steps, and LMs with callable programs.
arXiv Detail & Related papers (2022-08-09T21:47:01Z)
Chain of Thought Prompting Elicits Reasoning in Large Language Models [56.811278668446825]
This paper explores the ability of language models to generate a coherent chain of thought. Experiments show that inducing a chain of thought via prompting can enable sufficiently large language models to better perform reasoning tasks.
arXiv Detail & Related papers (2022-01-28T02:33:07Z)
Efficient Large Scale Language Modeling with Mixtures of Experts [61.45159383372181]
Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional computation. This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings.
arXiv Detail & Related papers (2021-12-20T17:05:11Z)
Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines. In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics. Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.