Related papers: Language Models are Symbolic Learners in Arithmetic

Language Models are Symbolic Learners in Arithmetic

URL: http://arxiv.org/abs/2410.15580v1
Date: Mon, 21 Oct 2024 01:57:16 GMT
Title: Language Models are Symbolic Learners in Arithmetic
Authors: Chunyuan Deng, Zhiqi Li, Roy Xie, Ruidi Chang, Hanjie Chen,
Abstract summary: Large Language Models (LLMs) are thought to struggle with arithmetic learning due to inherent differences between language modeling and numerical computation. We first investigate whether LLMs leverage partial products during arithmetic learning. We find that although LLMs can identify some partial products after learning, they fail to leverage them for arithmetic tasks, conversely.
Score: 8.34588487873447
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are thought to struggle with arithmetic learning due to the inherent differences between language modeling and numerical computation, but concrete evidence has been lacking. This work responds to this claim through a two-side experiment. We first investigate whether LLMs leverage partial products during arithmetic learning. We find that although LLMs can identify some partial products after learning, they fail to leverage them for arithmetic tasks, conversely. We then explore how LLMs approach arithmetic symbolically by breaking tasks into subgroups, hypothesizing that difficulties arise from subgroup complexity and selection. Our results show that when subgroup complexity is fixed, LLMs treat a collection of different arithmetic operations similarly. By analyzing position-level accuracy across different training sizes, we further observe that it follows a U-shaped pattern: LLMs quickly learn the easiest patterns at the first and last positions, while progressively learning the more difficult patterns in the middle positions. This suggests that LLMs select subgroup following an easy-to-hard paradigm during learning. Our work confirms that LLMs are pure symbolic learners in arithmetic tasks and underscores the importance of understanding them deeply through subgroup-level quantification.

Related papers

Logical Consistency of Large Language Models in Fact-checking [6.286017217366497]
Large language models (LLMs) have demonstrated significant success in performing varied natural language tasks. Despite their impressive ability to generate human-like texts, LLMs are infamous for their inconsistent responses.
arXiv Detail & Related papers (2024-12-20T17:42:25Z)
Unraveling Arithmetic in Large Language Models: The Role of Algebraic Structures [3.181878085746691]
Large language models (LLMs) have demonstrated remarkable mathematical capabilities, largely driven by chain-of-thought (CoT) prompting. We propose that LLMs learn arithmetic by capturing algebraic structures, such as emphCommutativity and emphIdentity properties. Our findings indicate that leveraging algebraic structures can enhance the LLMs' arithmetic capabilities, offering insights into improving their arithmetic performance.
arXiv Detail & Related papers (2024-11-25T10:23:11Z)
Investigating Large Language Models for Complex Word Identification in Multilingual and Multidomain Setups [1.8377902806196766]
Complex Word Identification (CWI) is an essential step in the lexical simplification task and has recently become a task on its own. Large language models (LLMs) recently became popular in the Natural Language Processing community because of their versatility and capability to solve unseen tasks in zero/few-shot settings. Our work investigates LLM usage, specifically open-source models such as Llama 2, Llama 3, and Vicuna v1.5, and closed-source, such as ChatGPT-3.5-turbo and GPT-4o, in the CWI, LCP, and MWE settings.
arXiv Detail & Related papers (2024-11-03T22:31:02Z)
Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models [7.115323364355489]
In-context learning, the ability to adapt based on a few examples in the input prompt, is a ubiquitous feature of large language models (LLMs) We first show that Llama $3$ $70$B can solve simple RL problems in-context. We then analyze the residual stream of Llama using Sparse Autoencoders (SAEs) and find representations that closely match temporal difference (TD) errors.
arXiv Detail & Related papers (2024-10-02T06:51:12Z)
Interpreting and Improving Large Language Models in Arithmetic Calculation [72.19753146621429]
Large language models (LLMs) have demonstrated remarkable potential across numerous applications. In this work, we delve into uncovering a specific mechanism by which LLMs execute calculations. We investigate the potential benefits of selectively fine-tuning these essential heads/MLPs to boost the LLMs' computational performance.
arXiv Detail & Related papers (2024-09-03T07:01:46Z)
LLMs' Understanding of Natural Language Revealed [0.0]
Large language models (LLMs) are the result of a massive experiment in bottom-up, data-driven reverse engineering of language at scale. We will focus on testing LLMs for their language understanding capabilities, their supposed forte.
arXiv Detail & Related papers (2024-07-29T01:21:11Z)
GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers [68.77382332826167]
Large language models (LLMs) have achieved impressive performance across various mathematical reasoning benchmarks. One essential and frequently occurring evidence is that when the math questions are slightly changed, LLMs can behave incorrectly. This motivates us to evaluate the robustness of LLMs' math reasoning capability by testing a wide range of question variations.
arXiv Detail & Related papers (2024-02-29T15:26:14Z)
Evaluating LLMs' Mathematical Reasoning in Financial Document Question Answering [53.56653281752486]
This study explores Large Language Models' mathematical reasoning on four financial question-answering datasets. We focus on sensitivity to table complexity and performance variations with an increasing number of arithmetic reasoning steps. We introduce a novel prompting technique tailored to semi-structured documents, matching or outperforming other baselines in performance.
arXiv Detail & Related papers (2024-02-17T05:10:18Z)
How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base Question Answering [52.86931192259096]
Knowledge Base Question Answering (KBQA) aims to answer natural language questions based on facts in knowledge bases. Recent works leverage the capabilities of large language models (LLMs) for logical form generation to improve performance.
arXiv Detail & Related papers (2024-01-11T09:27:50Z)
Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning [36.8749786658624]
Large Language Models (LLM) exhibit zero-shot mathematical reasoning capacity as a behavior emergent with scale. We show that small LMs can achieve reasonable arithmetic reasoning if arithmetic word problems are posed as a formalize-then-solve task.
arXiv Detail & Related papers (2023-12-09T13:20:49Z)
TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety. Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs. We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z)
PAL: Program-aided Language Models [112.94785609781503]
We present Program-Aided Language models (PaL) to understand natural language problems. PaL offloads the solution step to a programmatic runtime such as a Python interpreter. We set new state-of-the-art results in all 12 benchmarks.
arXiv Detail & Related papers (2022-11-18T18:56:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.