Neural Execution Engines: Learning to Execute Subroutines
- URL: http://arxiv.org/abs/2006.08084v3
- Date: Thu, 22 Oct 2020 22:20:54 GMT
- Title: Neural Execution Engines: Learning to Execute Subroutines
- Authors: Yujun Yan, Kevin Swersky, Danai Koutra, Parthasarathy Ranganathan,
Milad Hashemi
- Abstract summary: We study the generalization issues at the level of numerical subroutines that comprise common algorithms like sorting, shortest paths, and minimum spanning trees.
To generalize to unseen data, we show that encoding numbers with a binary representation leads to embeddings with rich structure once trained on downstream tasks like addition or multiplication.
- Score: 29.036699193820215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A significant effort has been made to train neural networks that replicate
algorithmic reasoning, but they often fail to learn the abstract concepts
underlying these algorithms. This is evidenced by their inability to generalize
to data distributions that are outside of their restricted training sets,
namely larger inputs and unseen data. We study these generalization issues at
the level of numerical subroutines that comprise common algorithms like
sorting, shortest paths, and minimum spanning trees. First, we observe that
transformer-based sequence-to-sequence models can learn subroutines like
sorting a list of numbers, but their performance rapidly degrades as the length
of lists grows beyond those found in the training set. We demonstrate that this
is due to attention weights that lose fidelity with longer sequences,
particularly when the input numbers are numerically similar. To address the
issue, we propose a learned conditional masking mechanism, which enables the
model to strongly generalize far outside of its training range with
near-perfect accuracy on a variety of algorithms. Second, to generalize to
unseen data, we show that encoding numbers with a binary representation leads
to embeddings with rich structure once trained on downstream tasks like
addition or multiplication. This allows the embedding to handle missing data by
faithfully interpolating numbers not seen during training.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.