Solving the multiplication problem of a large language model system
using a graph-based method
- URL: http://arxiv.org/abs/2310.13016v1
- Date: Wed, 18 Oct 2023 08:02:00 GMT
- Title: Solving the multiplication problem of a large language model system
using a graph-based method
- Authors: Turker Tuncer and Sengul Dogan and Mehmet Baygin and Prabal Datta
Barua and Abdul Hafeez-Baig and Ru-San Tan and Subrata Chakraborty and U.
Rajendra Acharya
- Abstract summary: ChatGPT possesses excellent natural language processing capabilities but is inadequate for solving arithmetic problems.
We developed a graph-based multiplication algorithm that emulated human-like numerical operations.
Our proposed algorithm attained 100% accuracy for 1,000,000 large number multiplication tasks.
- Score: 20.43440908151311
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The generative pre-trained transformer (GPT)-based chatbot software ChatGPT
possesses excellent natural language processing capabilities but is inadequate
for solving arithmetic problems, especially multiplication. Its GPT structure
uses a computational graph for multiplication, which has limited accuracy
beyond simple multiplication operations. We developed a graph-based
multiplication algorithm that emulated human-like numerical operations by
incorporating a 10k operator, where k represents the maximum power to base 10
of the larger of two input numbers. Our proposed algorithm attained 100%
accuracy for 1,000,000 large number multiplication tasks, effectively solving
the multiplication challenge of GPT-based and other large language models. Our
work highlights the importance of blending simple human insights into the
design of artificial intelligence algorithms. Keywords: Graph-based
multiplication; ChatGPT; Multiplication problem
Related papers
- Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines [7.695524275630717]
Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of natural language processing and reasoning tasks.
We propose a Composable Arithmetic Execution Framework (CAEF) that enables LLMs to learn to execute step-by-step computations by emulating Turing Machines.
In our evaluation, CAEF achieves nearly 100% accuracy across seven common mathematical operations on the LLaMA 3.1-8B model.
arXiv Detail & Related papers (2024-10-10T13:23:49Z) - Offline Imitation Learning Through Graph Search and Retrieval [57.57306578140857]
Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills.
We propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval.
GSR can achieve a 10% to 30% higher success rate and over 30% higher proficiency compared to baselines.
arXiv Detail & Related papers (2024-07-22T06:12:21Z) - Dissecting Multiplication in Transformers: Insights into LLMs [23.109124772063574]
We focus on a typical arithmetic task, integer multiplication, to explore and explain the imperfection of transformers in this domain.
We provide comprehensive analysis of a vanilla transformer trained to perform n-digit integer multiplication.
We propose improvements to enhance transformers performance on multiplication tasks.
arXiv Detail & Related papers (2024-07-22T04:07:26Z) - MathScale: Scaling Instruction Tuning for Mathematical Reasoning [70.89605383298331]
Large language models (LLMs) have demonstrated remarkable capabilities in problem-solving.
However, their proficiency in solving mathematical problems remains inadequate.
We propose MathScale, a simple and scalable method to create high-quality mathematical reasoning data.
arXiv Detail & Related papers (2024-03-05T11:42:59Z) - Positional Description Matters for Transformers Arithmetic [58.4739272381373]
Transformers often falter on arithmetic tasks despite their vast capabilities.
We propose several ways to fix the issue, either by modifying the positional encoding directly, or by modifying the representation of the arithmetic task to leverage standard positional encoding differently.
arXiv Detail & Related papers (2023-11-22T00:31:01Z) - GPT Can Solve Mathematical Problems Without a Calculator [24.114064917059565]
We show that a large language model can accurately perform arithmetic operations with almost 100% accuracy without data leakage.
We also demonstrate that our MathGLM, fine-tuned from GLM-10B, achieves similar performance to GPT-4 on a 5,000-samples Chinese math problem test set.
arXiv Detail & Related papers (2023-09-06T06:18:16Z) - WizardMath: Empowering Mathematical Reasoning for Large Language Models
via Reinforced Evol-Instruct [128.89645483139236]
We present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math.
Our model even surpasses ChatGPT-3.5, Claude Instant-1, PaLM-2 and Minerva on GSM8k, simultaneously surpasses Text-davinci, PaLM-1 and GPT-3 on MATH.
arXiv Detail & Related papers (2023-08-18T14:23:21Z) - ChatGPT for Programming Numerical Methods [2.741266294612776]
ChatGPT is a large language model recently released by the OpenAI company.
We explore for the first time the capability of ChatGPT for programming numerical algorithms.
arXiv Detail & Related papers (2023-03-21T12:18:17Z) - Transformers discover an elementary calculation system exploiting local
attention and grid-like problem representation [0.424243593213882]
We show that universal transformers equipped with local attention and adaptive halting mechanisms can learn to exploit an external, grid-like memory to carry out multi-digit addition.
The proposed model achieves remarkable accuracy even when tested with problems requiring extrapolation outside the training distribution.
arXiv Detail & Related papers (2022-07-06T09:29:56Z) - Recognizing and Verifying Mathematical Equations using Multiplicative
Differential Neural Units [86.9207811656179]
We show that memory-augmented neural networks (NNs) can achieve higher-order, memory-augmented extrapolation, stable performance, and faster convergence.
Our models achieve a 1.53% average improvement over current state-of-the-art methods in equation verification and achieve a 2.22% Top-1 average accuracy and 2.96% Top-5 average accuracy for equation completion.
arXiv Detail & Related papers (2021-04-07T03:50:11Z) - Strong Generalization and Efficiency in Neural Programs [69.18742158883869]
We study the problem of learning efficient algorithms that strongly generalize in the framework of neural program induction.
By carefully designing the input / output interfaces of the neural model and through imitation, we are able to learn models that produce correct results for arbitrary input sizes.
arXiv Detail & Related papers (2020-07-07T17:03:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.