Related papers: Neural Execution Engines: Learning to Execute Subroutines

Neural Execution Engines: Learning to Execute Subroutines

URL: http://arxiv.org/abs/2006.08084v3
Date: Thu, 22 Oct 2020 22:20:54 GMT
Title: Neural Execution Engines: Learning to Execute Subroutines
Authors: Yujun Yan, Kevin Swersky, Danai Koutra, Parthasarathy Ranganathan, Milad Hashemi
Abstract summary: We study the generalization issues at the level of numerical subroutines that comprise common algorithms like sorting, shortest paths, and minimum spanning trees. To generalize to unseen data, we show that encoding numbers with a binary representation leads to embeddings with rich structure once trained on downstream tasks like addition or multiplication.
Score: 29.036699193820215
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A significant effort has been made to train neural networks that replicate algorithmic reasoning, but they often fail to learn the abstract concepts underlying these algorithms. This is evidenced by their inability to generalize to data distributions that are outside of their restricted training sets, namely larger inputs and unseen data. We study these generalization issues at the level of numerical subroutines that comprise common algorithms like sorting, shortest paths, and minimum spanning trees. First, we observe that transformer-based sequence-to-sequence models can learn subroutines like sorting a list of numbers, but their performance rapidly degrades as the length of lists grows beyond those found in the training set. We demonstrate that this is due to attention weights that lose fidelity with longer sequences, particularly when the input numbers are numerically similar. To address the issue, we propose a learned conditional masking mechanism, which enables the model to strongly generalize far outside of its training range with near-perfect accuracy on a variety of algorithms. Second, to generalize to unseen data, we show that encoding numbers with a binary representation leads to embeddings with rich structure once trained on downstream tasks like addition or multiplication. This allows the embedding to handle missing data by faithfully interpolating numbers not seen during training.

Related papers

Teaching Arithmetic to Small Transformers [39.72665384986095]
This study investigates how small transformers can efficiently learn arithmetic operations. We first demonstrate that conventional training data is not the most effective for arithmetic learning. We then train on chain-of-thought style data that includes intermediate step results.
arXiv Detail & Related papers (2023-07-07T04:33:31Z)
Distributive Pre-Training of Generative Modeling Using Matrix-Product States [0.0]
We consider an alternative training scheme utilizing basic tensor network operations, e.g., summation and compression. The training algorithm is based on compressing the superposition state constructed from all the training data in product state representation. We benchmark the algorithm on the MNIST dataset and show reasonable results for generating new images and classification tasks.
arXiv Detail & Related papers (2023-06-26T15:46:08Z)
Towards Better Out-of-Distribution Generalization of Neural Algorithmic Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks. The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z)
Refining neural network predictions using background knowledge [68.35246878394702]
We show we can use logical background knowledge in learning system to compensate for a lack of labeled training data. We introduce differentiable refinement functions that find a corrected prediction close to the original prediction. This algorithm finds optimal refinements on complex SAT formulas in significantly fewer iterations and frequently finds solutions where gradient descent can not.
arXiv Detail & Related papers (2022-06-10T10:17:59Z)
Discovering Non-monotonic Autoregressive Orderings with Variational Inference [67.27561153666211]
We develop an unsupervised parallelizable learner that discovers high-quality generation orders purely from training data. We implement the encoder as a Transformer with non-causal attention that outputs permutations in one forward pass. Empirical results in language modeling tasks demonstrate that our method is context-aware and discovers orderings that are competitive with or even better than fixed orders.
arXiv Detail & Related papers (2021-10-27T16:08:09Z)
Alleviate Exposure Bias in Sequence Prediction \\ with Recurrent Neural Networks [47.52214243454995]
A popular strategy to train recurrent neural networks (RNNs) is to take the ground truth as input at each time step. We propose a fully differentiable training algorithm for RNNs to better capture long-term dependencies.
arXiv Detail & Related papers (2021-03-22T06:15:22Z)
When is Memorization of Irrelevant Training Data Necessary for High-Accuracy Learning? [53.523017945443115]
We describe natural prediction problems in which every sufficiently accurate training algorithm must encode, in the prediction model, essentially all the information about a large subset of its training examples. Our results do not depend on the training algorithm or the class of models used for learning.
arXiv Detail & Related papers (2020-12-11T15:25:14Z)
Strong Generalization and Efficiency in Neural Programs [69.18742158883869]
We study the problem of learning efficient algorithms that strongly generalize in the framework of neural program induction. By carefully designing the input / output interfaces of the neural model and through imitation, we are able to learn models that produce correct results for arbitrary input sizes.
arXiv Detail & Related papers (2020-07-07T17:03:02Z)
Cyclic Boosting -- an explainable supervised machine learning algorithm [0.0]
We propose the novel "Cyclic Boosting" machine learning algorithm. It allows to efficiently perform accurate regression and classification tasks while at the same time allowing a detailed understanding of how each individual prediction was made.
arXiv Detail & Related papers (2020-02-09T18:52:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.