Related papers: Transformers discover an elementary calculation system exploiting local attention and grid-like problem representation

Transformers discover an elementary calculation system exploiting local attention and grid-like problem representation

URL: http://arxiv.org/abs/2207.02536v1
Date: Wed, 6 Jul 2022 09:29:56 GMT
Title: Transformers discover an elementary calculation system exploiting local attention and grid-like problem representation
Authors: Samuel Cognolato and Alberto Testolin
Abstract summary: We show that universal transformers equipped with local attention and adaptive halting mechanisms can learn to exploit an external, grid-like memory to carry out multi-digit addition. The proposed model achieves remarkable accuracy even when tested with problems requiring extrapolation outside the training distribution.
Score: 0.424243593213882
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Mathematical reasoning is one of the most impressive achievements of human intellect but remains a formidable challenge for artificial intelligence systems. In this work we explore whether modern deep learning architectures can learn to solve a symbolic addition task by discovering effective arithmetic procedures. Although the problem might seem trivial at first glance, generalizing arithmetic knowledge to operations involving a higher number of terms, possibly composed by longer sequences of digits, has proven extremely challenging for neural networks. Here we show that universal transformers equipped with local attention and adaptive halting mechanisms can learn to exploit an external, grid-like memory to carry out multi-digit addition. The proposed model achieves remarkable accuracy even when tested with problems requiring extrapolation outside the training distribution; most notably, it does so by discovering human-like calculation strategies such as place value alignment.

Related papers

Offline Imitation Learning Through Graph Search and Retrieval [57.57306578140857]
Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills. We propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval. GSR can achieve a 10% to 30% higher success rate and over 30% higher proficiency compared to baselines.
arXiv Detail & Related papers (2024-07-22T06:12:21Z)
Symbolic Equation Solving via Reinforcement Learning [9.361474110798143]
We propose a novel deep-learning interface involving a reinforcement-learning agent that operates a symbolic stack calculator. By construction, this system is capable of exact transformations and immune to hallucination.
arXiv Detail & Related papers (2024-01-24T13:42:24Z)
Brain-Inspired Computational Intelligence via Predictive Coding [89.6335791546526]
Predictive coding (PC) has shown promising performance in machine intelligence tasks. PC can model information processing in different brain areas, can be used in cognitive control and robotics.
arXiv Detail & Related papers (2023-08-15T16:37:16Z)
The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks [59.26515696183751]
We show that algorithm discovery in neural networks is sometimes more complex. We show that even simple learning problems can admit a surprising diversity of solutions.
arXiv Detail & Related papers (2023-06-30T17:59:13Z)
Can neural networks do arithmetic? A survey on the elementary numerical skills of state-of-the-art deep learning models [0.424243593213882]
It is unclear whether deep learning models possess an elementary understanding of quantities and symbolic numbers. We critically examine the recent literature, concluding that even state-of-the-art architectures often fall short when probed with relatively simple tasks designed to test basic numerical and arithmetic knowledge.
arXiv Detail & Related papers (2023-03-14T09:30:52Z)
Learning to solve arithmetic problems with a virtual abacus [0.35911228556176483]
We introduce a deep reinforcement learning framework that allows to simulate how cognitive agents could learn to solve arithmetic problems. The proposed model successfully learns to perform multi-digit additions and subtractions, achieving an error rate below 1%. We analyze the most common error patterns to better understand the limitations and biases resulting from our design choices.
arXiv Detail & Related papers (2023-01-17T13:25:52Z)
A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation. Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z)
Recognizing and Verifying Mathematical Equations using Multiplicative Differential Neural Units [86.9207811656179]
We show that memory-augmented neural networks (NNs) can achieve higher-order, memory-augmented extrapolation, stable performance, and faster convergence. Our models achieve a 1.53% average improvement over current state-of-the-art methods in equation verification and achieve a 2.22% Top-1 average accuracy and 2.96% Top-5 average accuracy for equation completion.
arXiv Detail & Related papers (2021-04-07T03:50:11Z)
Machine Number Sense: A Dataset of Visual Arithmetic Problems for Abstract and Relational Reasoning [95.18337034090648]
We propose a dataset, Machine Number Sense (MNS), consisting of visual arithmetic problems automatically generated using a grammar model--And-Or Graph (AOG) These visual arithmetic problems are in the form of geometric figures. We benchmark the MNS dataset using four predominant neural network models as baselines in this visual reasoning task.
arXiv Detail & Related papers (2020-04-25T17:14:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.