Related papers: Solving the multiplication problem of a large language model system using a graph-based method

Solving the multiplication problem of a large language model system using a graph-based method

URL: http://arxiv.org/abs/2310.13016v1
Date: Wed, 18 Oct 2023 08:02:00 GMT
Title: Solving the multiplication problem of a large language model system using a graph-based method
Authors: Turker Tuncer and Sengul Dogan and Mehmet Baygin and Prabal Datta Barua and Abdul Hafeez-Baig and Ru-San Tan and Subrata Chakraborty and U. Rajendra Acharya
Abstract summary: ChatGPT possesses excellent natural language processing capabilities but is inadequate for solving arithmetic problems. We developed a graph-based multiplication algorithm that emulated human-like numerical operations. Our proposed algorithm attained 100% accuracy for 1,000,000 large number multiplication tasks.
Score: 20.43440908151311
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The generative pre-trained transformer (GPT)-based chatbot software ChatGPT possesses excellent natural language processing capabilities but is inadequate for solving arithmetic problems, especially multiplication. Its GPT structure uses a computational graph for multiplication, which has limited accuracy beyond simple multiplication operations. We developed a graph-based multiplication algorithm that emulated human-like numerical operations by incorporating a 10k operator, where k represents the maximum power to base 10 of the larger of two input numbers. Our proposed algorithm attained 100% accuracy for 1,000,000 large number multiplication tasks, effectively solving the multiplication challenge of GPT-based and other large language models. Our work highlights the importance of blending simple human insights into the design of artificial intelligence algorithms. Keywords: Graph-based multiplication; ChatGPT; Multiplication problem

Related papers

Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines [7.695524275630717]
Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of natural language processing and reasoning tasks. We propose a Composable Arithmetic Execution Framework (CAEF) that enables LLMs to learn to execute step-by-step computations by emulating Turing Machines. In our evaluation, CAEF achieves nearly 100% accuracy across seven common mathematical operations on the LLaMA 3.1-8B model.
arXiv Detail & Related papers (2024-10-10T13:23:49Z)
Offline Imitation Learning Through Graph Search and Retrieval [57.57306578140857]
Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills. We propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval. GSR can achieve a 10% to 30% higher success rate and over 30% higher proficiency compared to baselines.
arXiv Detail & Related papers (2024-07-22T06:12:21Z)
Dissecting Multiplication in Transformers: Insights into LLMs [23.109124772063574]
We focus on a typical arithmetic task, integer multiplication, to explore and explain the imperfection of transformers in this domain. We provide comprehensive analysis of a vanilla transformer trained to perform n-digit integer multiplication. We propose improvements to enhance transformers performance on multiplication tasks.
arXiv Detail & Related papers (2024-07-22T04:07:26Z)
Transformers Can Do Arithmetic with the Right Embeddings [75.66545271398704]
We show how to improve the performance of transformers on arithmetic tasks. We find that training on only 20 digit numbers with a single GPU for one day, we can reach state-of-the-art performance. These gains in numeracy also unlock improvements on other multi-step reasoning tasks including sorting and multiplication.
arXiv Detail & Related papers (2024-05-27T17:49:18Z)
MathScale: Scaling Instruction Tuning for Mathematical Reasoning [70.89605383298331]
Large language models (LLMs) have demonstrated remarkable capabilities in problem-solving. However, their proficiency in solving mathematical problems remains inadequate. We propose MathScale, a simple and scalable method to create high-quality mathematical reasoning data.
arXiv Detail & Related papers (2024-03-05T11:42:59Z)
Positional Description Matters for Transformers Arithmetic [58.4739272381373]
Transformers often falter on arithmetic tasks despite their vast capabilities. We propose several ways to fix the issue, either by modifying the positional encoding directly, or by modifying the representation of the arithmetic task to leverage standard positional encoding differently.
arXiv Detail & Related papers (2023-11-22T00:31:01Z)
GPT Can Solve Mathematical Problems Without a Calculator [24.114064917059565]
We show that a large language model can accurately perform arithmetic operations with almost 100% accuracy without data leakage. We also demonstrate that our MathGLM, fine-tuned from GLM-10B, achieves similar performance to GPT-4 on a 5,000-samples Chinese math problem test set.
arXiv Detail & Related papers (2023-09-06T06:18:16Z)
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct [128.89645483139236]
We present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. Our model even surpasses ChatGPT-3.5, Claude Instant-1, PaLM-2 and Minerva on GSM8k, simultaneously surpasses Text-davinci, PaLM-1 and GPT-3 on MATH.
arXiv Detail & Related papers (2023-08-18T14:23:21Z)
ChatGPT for Programming Numerical Methods [2.741266294612776]
ChatGPT is a large language model recently released by the OpenAI company. We explore for the first time the capability of ChatGPT for programming numerical algorithms.
arXiv Detail & Related papers (2023-03-21T12:18:17Z)
Transformers discover an elementary calculation system exploiting local attention and grid-like problem representation [0.424243593213882]
We show that universal transformers equipped with local attention and adaptive halting mechanisms can learn to exploit an external, grid-like memory to carry out multi-digit addition. The proposed model achieves remarkable accuracy even when tested with problems requiring extrapolation outside the training distribution.
arXiv Detail & Related papers (2022-07-06T09:29:56Z)
Recognizing and Verifying Mathematical Equations using Multiplicative Differential Neural Units [86.9207811656179]
We show that memory-augmented neural networks (NNs) can achieve higher-order, memory-augmented extrapolation, stable performance, and faster convergence. Our models achieve a 1.53% average improvement over current state-of-the-art methods in equation verification and achieve a 2.22% Top-1 average accuracy and 2.96% Top-5 average accuracy for equation completion.
arXiv Detail & Related papers (2021-04-07T03:50:11Z)
Strong Generalization and Efficiency in Neural Programs [69.18742158883869]
We study the problem of learning efficient algorithms that strongly generalize in the framework of neural program induction. By carefully designing the input / output interfaces of the neural model and through imitation, we are able to learn models that produce correct results for arbitrary input sizes.
arXiv Detail & Related papers (2020-07-07T17:03:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.