OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step
- URL: http://arxiv.org/abs/2406.06576v3
- Date: Sat, 29 Jun 2024 19:13:23 GMT
- Title: OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step
- Authors: Owen Dugan, Donato Manuel Jimenez Beneto, Charlotte Loh, Zhuo Chen, Rumen Dangovski, Marin Soljačić,
- Abstract summary: Large Language Models (LLMs) still face challenges in accurately performing complex arithmetic operations.
We propose a framework that enables exact arithmetic in textita single autoregressive step, providing faster, more secure, and more interpretable LLM systems.
- Score: 7.7168728919692855
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Despite significant advancements in text generation and reasoning, Large Language Models (LLMs) still face challenges in accurately performing complex arithmetic operations. To achieve accurate calculations, language model systems often enable LLMs to generate code for arithmetic operations. However, this approach compromises speed and security and, if finetuning is involved, risks the language model losing prior capabilities. We propose a framework that enables exact arithmetic in \textit{a single autoregressive step}, providing faster, more secure, and more interpretable LLM systems with arithmetic capabilities. We use the hidden states of an LLM to control a symbolic architecture which performs arithmetic. Our implementation using Llama 3 8B Instruct with OccamNet as a symbolic model (OccamLlama) achieves 100\% accuracy on single arithmetic operations ($+,-,\times,\div,\sin{},\cos{},\log{},\exp{},\sqrt{}$), outperforming GPT 4o and on par with GPT 4o using a code interpreter. OccamLlama also outperforms GPT 4o both with and without a code interpreter on mathematical problem solving benchmarks involving challenging arithmetic, thus enabling small LLMs to match the arithmetic performance of even much larger models. We will make our code public shortly.
Related papers
- Arithmetic Reasoning with LLM: Prolog Generation & Permutation [2.1867261071129125]
We show that Prolog-based arithmetic problem-solving outperforms CoT generation in the GSM8K benchmark.
We propose to permute the ground truth predicates for more robust LLM training via data augmentation.
arXiv Detail & Related papers (2024-05-28T07:13:25Z) - Reverse That Number! Decoding Order Matters in Arithmetic Learning [49.5504492920404]
Our work introduces a novel strategy that reevaluates the digit order by prioritizing output from the least significant digit.
Compared to the previous state-of-the-art (SOTA) method, our findings reveal an overall improvement of in accuracy while requiring only a third of the tokens typically used during training.
arXiv Detail & Related papers (2024-03-09T09:04:53Z) - GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers [68.77382332826167]
Large language models (LLMs) have achieved impressive performance across various mathematical reasoning benchmarks.
One essential and frequently occurring evidence is that when the math questions are slightly changed, LLMs can behave incorrectly.
This motivates us to evaluate the robustness of LLMs' math reasoning capability by testing a wide range of question variations.
arXiv Detail & Related papers (2024-02-29T15:26:14Z) - ArthModel: Enhance Arithmetic Skills to Large Language Model [0.0]
This work provides different ways of thinking, training and using a language model.
The codes and models will be released at urlhttps://www.eteced.com/eteced/arithmetic_finetuning_v1.
arXiv Detail & Related papers (2023-11-30T15:06:50Z) - MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical
Reasoning [52.97768001837269]
We present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations.
We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions.
This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems.
arXiv Detail & Related papers (2023-10-05T17:52:09Z) - ALGO: Synthesizing Algorithmic Programs with LLM-Generated Oracle
Verifiers [60.6418431624873]
Large language models (LLMs) excel at implementing code from functionality descriptions but struggle with algorithmic problems.
We propose ALGO, a framework that synthesizes Algorithmic programs with LLM-Generated Oracles to guide the generation and verify their correctness.
Experiments show that when equipped with ALGO, we achieve an 8x better one-submission pass rate over the Codex model and a 2.6x better one-submission pass rate over CodeT.
arXiv Detail & Related papers (2023-05-24T00:10:15Z) - SatLM: Satisfiability-Aided Language Models Using Declarative Prompting [68.40726892904286]
We propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of large language models (LLMs)
We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer.
We evaluate SATLM on 8 different datasets and show that it consistently outperforms program-aided LMs in the imperative paradigm.
arXiv Detail & Related papers (2023-05-16T17:55:51Z) - How well do Large Language Models perform in Arithmetic tasks? [25.638682874990206]
Large language models have emerged abilities including chain-of-thought to answer math word problems step by step.
To the best of our knowledge, there is no work to focus on evaluating the arithmetic ability of large language models.
In this work, we propose an arithmetic dataset MATH 401 to test the latest large language models.
arXiv Detail & Related papers (2023-03-16T09:28:15Z) - MathPrompter: Mathematical Reasoning using Large Language Models [7.953723258038284]
Large Language Models (LLMs) have limited performance when solving arithmetic reasoning tasks.
MathPrompter uses the Zero-shot chain-of-thought prompting technique to generate multiple Algebraic expressions or Python functions to solve the same math problem in different ways.
arXiv Detail & Related papers (2023-03-04T04:43:49Z) - PAL: Program-aided Language Models [112.94785609781503]
We present Program-Aided Language models (PaL) to understand natural language problems.
PaL offloads the solution step to a programmatic runtime such as a Python interpreter.
We set new state-of-the-art results in all 12 benchmarks.
arXiv Detail & Related papers (2022-11-18T18:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.