Bearing Syntactic Fruit with Stack-Augmented Neural Networks
- URL: http://arxiv.org/abs/2511.03547v1
- Date: Wed, 05 Nov 2025 15:30:58 GMT
- Title: Bearing Syntactic Fruit with Stack-Augmented Neural Networks
- Authors: Brian DuSell, Ryan Cotterell,
- Abstract summary: We show that stack-augmented neural networks are more accurate models of human language acquisition than standard architectures.<n>We also propose a modification to the stack RNN architecture that improves hierarchical generalization.
- Score: 59.49467149799849
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Any finite set of training data is consistent with an infinite number of hypothetical algorithms that could have generated it. Studies have shown that when human children learn language, they consistently favor hypotheses based on hierarchical syntactic rules without ever encountering disambiguating examples. A recent line of work has inquired as to whether common neural network architectures share this bias, finding that they do so only under special conditions: when syntactically supervised, when pre-trained on massive corpora, or when trained long past convergence. In this paper, we demonstrate, for the first time, neural network architectures that are able to generalize in human-like fashion without any of the aforementioned requirements: stack-augmented neural networks. We test three base architectures (transformer, simple RNN, LSTM) augmented with two styles of stack: the superposition stack of Joulin & Mikolov (2015) and a nondeterministic generalization of it proposed by DuSell & Chiang (2023). We find that transformers with nondeterministic stacks generalize best out of these architectures on a classical question formation task. We also propose a modification to the stack RNN architecture that improves hierarchical generalization. These results suggest that stack-augmented neural networks may be more accurate models of human language acquisition than standard architectures, serving as useful objects of psycholinguistic study. Our code is publicly available.
Related papers
- ONNX-Net: Towards Universal Representations and Instant Performance Prediction for Neural Architectures [60.14199724905456]
ONNX-Bench is a benchmark consisting of a collection of neural networks in a unified format based on ONNX files.<n> ONNX-Net represents any neural architecture using natural language descriptions acting as an input to a performance predictor.<n>Experiments show strong zero-shot performance across disparate search spaces using only a small amount of pretraining samples.
arXiv Detail & Related papers (2025-10-06T15:43:36Z) - Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
We train and evaluate neural networks directly as binary classifiers of strings.<n>We provide results on a variety of languages across the Chomsky hierarchy for three neural architectures.<n>Our contributions will facilitate theoretically sound empirical testing of language recognition claims in future work.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - On The Expressivity of Recurrent Neural Cascades [48.87943990557107]
Recurrent Neural Cascades (RNCs) are the recurrent neural networks with no cyclic dependencies among recurrent neurons.
We show that RNCs can achieve the expressivity of all regular languages by introducing neurons that can implement groups.
arXiv Detail & Related papers (2023-12-14T15:47:26Z) - Nondeterministic Stacks in Neural Networks [0.456877715768796]
We develop a differentiable data structure that efficiently simulates a nondeterministic pushdown automaton.
We show that this raises their formal recognition power to arbitrary context-free languages.
We also show that an RNN augmented with a nondeterministic stack is capable of surprisingly powerful behavior.
arXiv Detail & Related papers (2023-04-25T16:00:40Z) - GENNAPE: Towards Generalized Neural Architecture Performance Estimators [25.877126553261434]
GENNAPE represents a given neural network as a Computation Graph (CG) of atomic operations.
It first learns a graph encoder via Contrastive Learning to encourage network separation by topological features.
Experiments show that GENNAPE pretrained on NAS-Bench-101 can achieve superior transferability to 5 different public neural network benchmarks.
arXiv Detail & Related papers (2022-11-30T18:27:41Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Learning Hierarchical Structures with Differentiable Nondeterministic
Stacks [25.064819128982556]
We present a stack RNN model based on the recently proposed Nondeterministic Stack RNN (NS-RNN)
We show that the NS-RNN achieves lower cross-entropy than all previous stack RNNs on five context-free language modeling tasks.
We also propose a restricted version of the NS-RNN that makes it practical to use for language modeling on natural language.
arXiv Detail & Related papers (2021-09-05T03:25:23Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Can RNNs learn Recursive Nested Subject-Verb Agreements? [4.094098809740732]
Language processing requires the ability to extract nested tree structures.
Recent advances in Recurrent Neural Networks (RNNs) achieve near-human performance in some language tasks.
arXiv Detail & Related papers (2021-01-06T20:47:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.