Transformers discover an elementary calculation system exploiting local
attention and grid-like problem representation
- URL: http://arxiv.org/abs/2207.02536v1
- Date: Wed, 6 Jul 2022 09:29:56 GMT
- Title: Transformers discover an elementary calculation system exploiting local
attention and grid-like problem representation
- Authors: Samuel Cognolato and Alberto Testolin
- Abstract summary: We show that universal transformers equipped with local attention and adaptive halting mechanisms can learn to exploit an external, grid-like memory to carry out multi-digit addition.
The proposed model achieves remarkable accuracy even when tested with problems requiring extrapolation outside the training distribution.
- Score: 0.424243593213882
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Mathematical reasoning is one of the most impressive achievements of human
intellect but remains a formidable challenge for artificial intelligence
systems. In this work we explore whether modern deep learning architectures can
learn to solve a symbolic addition task by discovering effective arithmetic
procedures. Although the problem might seem trivial at first glance,
generalizing arithmetic knowledge to operations involving a higher number of
terms, possibly composed by longer sequences of digits, has proven extremely
challenging for neural networks. Here we show that universal transformers
equipped with local attention and adaptive halting mechanisms can learn to
exploit an external, grid-like memory to carry out multi-digit addition. The
proposed model achieves remarkable accuracy even when tested with problems
requiring extrapolation outside the training distribution; most notably, it
does so by discovering human-like calculation strategies such as place value
alignment.
Related papers
- Offline Imitation Learning Through Graph Search and Retrieval [57.57306578140857]
Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills.
We propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval.
GSR can achieve a 10% to 30% higher success rate and over 30% higher proficiency compared to baselines.
arXiv Detail & Related papers (2024-07-22T06:12:21Z) - Symbolic Equation Solving via Reinforcement Learning [9.361474110798143]
We propose a novel deep-learning interface involving a reinforcement-learning agent that operates a symbolic stack calculator.
By construction, this system is capable of exact transformations and immune to hallucination.
arXiv Detail & Related papers (2024-01-24T13:42:24Z) - Brain-Inspired Computational Intelligence via Predictive Coding [89.6335791546526]
Predictive coding (PC) has shown promising performance in machine intelligence tasks.
PC can model information processing in different brain areas, can be used in cognitive control and robotics.
arXiv Detail & Related papers (2023-08-15T16:37:16Z) - The Clock and the Pizza: Two Stories in Mechanistic Explanation of
Neural Networks [59.26515696183751]
We show that algorithm discovery in neural networks is sometimes more complex.
We show that even simple learning problems can admit a surprising diversity of solutions.
arXiv Detail & Related papers (2023-06-30T17:59:13Z) - Can neural networks do arithmetic? A survey on the elementary numerical
skills of state-of-the-art deep learning models [0.424243593213882]
It is unclear whether deep learning models possess an elementary understanding of quantities and symbolic numbers.
We critically examine the recent literature, concluding that even state-of-the-art architectures often fall short when probed with relatively simple tasks designed to test basic numerical and arithmetic knowledge.
arXiv Detail & Related papers (2023-03-14T09:30:52Z) - Learning to solve arithmetic problems with a virtual abacus [0.35911228556176483]
We introduce a deep reinforcement learning framework that allows to simulate how cognitive agents could learn to solve arithmetic problems.
The proposed model successfully learns to perform multi-digit additions and subtractions, achieving an error rate below 1%.
We analyze the most common error patterns to better understand the limitations and biases resulting from our design choices.
arXiv Detail & Related papers (2023-01-17T13:25:52Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Recognizing and Verifying Mathematical Equations using Multiplicative
Differential Neural Units [86.9207811656179]
We show that memory-augmented neural networks (NNs) can achieve higher-order, memory-augmented extrapolation, stable performance, and faster convergence.
Our models achieve a 1.53% average improvement over current state-of-the-art methods in equation verification and achieve a 2.22% Top-1 average accuracy and 2.96% Top-5 average accuracy for equation completion.
arXiv Detail & Related papers (2021-04-07T03:50:11Z) - Machine Number Sense: A Dataset of Visual Arithmetic Problems for
Abstract and Relational Reasoning [95.18337034090648]
We propose a dataset, Machine Number Sense (MNS), consisting of visual arithmetic problems automatically generated using a grammar model--And-Or Graph (AOG)
These visual arithmetic problems are in the form of geometric figures.
We benchmark the MNS dataset using four predominant neural network models as baselines in this visual reasoning task.
arXiv Detail & Related papers (2020-04-25T17:14:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.