Can RNNs learn Recursive Nested Subject-Verb Agreements?
- URL: http://arxiv.org/abs/2101.02258v1
- Date: Wed, 6 Jan 2021 20:47:02 GMT
- Title: Can RNNs learn Recursive Nested Subject-Verb Agreements?
- Authors: Yair Lakretz, Th\'eo Desbordes, Jean-R\'emi King, Beno\^it Crabb\'e,
Maxime Oquab, Stanislas Dehaene
- Abstract summary: Language processing requires the ability to extract nested tree structures.
Recent advances in Recurrent Neural Networks (RNNs) achieve near-human performance in some language tasks.
- Score: 4.094098809740732
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the fundamental principles of contemporary linguistics states that
language processing requires the ability to extract recursively nested tree
structures. However, it remains unclear whether and how this code could be
implemented in neural circuits. Recent advances in Recurrent Neural Networks
(RNNs), which achieve near-human performance in some language tasks, provide a
compelling model to address such questions. Here, we present a new framework to
study recursive processing in RNNs, using subject-verb agreement as a probe
into the representations of the neural network. We trained six distinct types
of RNNs on a simplified probabilistic context-free grammar designed to
independently manipulate the length of a sentence and the depth of its
syntactic tree. All RNNs generalized to subject-verb dependencies longer than
those seen during training. However, none systematically generalized to deeper
tree structures, even those with a structural bias towards learning nested tree
(i.e., stack-RNNs). In addition, our analyses revealed primacy and recency
effects in the generalization patterns of LSTM-based models, showing that these
models tend to perform well on the outer- and innermost parts of a
center-embedded tree structure, but poorly on its middle levels. Finally,
probing the internal states of the model during the processing of sentences
with nested tree structures, we found a complex encoding of grammatical
agreement information (e.g. grammatical number), in which all the information
for multiple words nouns was carried by a single unit. Taken together, these
results indicate how neural networks may extract bounded nested tree
structures, without learning a systematic recursive rule.
Related papers
- Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers.
It is common to instead use proxy tasks that are similar in only an informal sense.
We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - On The Expressivity of Recurrent Neural Cascades [48.87943990557107]
Recurrent Neural Cascades (RNCs) are the recurrent neural networks with no cyclic dependencies among recurrent neurons.
We show that RNCs can achieve the expressivity of all regular languages by introducing neurons that can implement groups.
arXiv Detail & Related papers (2023-12-14T15:47:26Z) - Neural-Symbolic Recursive Machine for Systematic Generalization [113.22455566135757]
We introduce the Neural-Symbolic Recursive Machine (NSR), whose core is a Grounded Symbol System (GSS)
NSR integrates neural perception, syntactic parsing, and semantic reasoning.
We evaluate NSR's efficacy across four challenging benchmarks designed to probe systematic generalization capabilities.
arXiv Detail & Related papers (2022-10-04T13:27:38Z) - Active Predictive Coding Networks: A Neural Solution to the Problem of
Learning Reference Frames and Part-Whole Hierarchies [1.5990720051907859]
We introduce Active Predictive Coding Networks (APCNs)
APCNs are a new class of neural networks that solve a major problem posed by Hinton and others in the fields of artificial intelligence and brain modeling.
We demonstrate that APCNs can (a) learn to parse images into part-whole hierarchies, (b) learn compositional representations, and (c) transfer their knowledge to unseen classes of objects.
arXiv Detail & Related papers (2022-01-14T21:22:48Z) - Learning Hierarchical Structures with Differentiable Nondeterministic
Stacks [25.064819128982556]
We present a stack RNN model based on the recently proposed Nondeterministic Stack RNN (NS-RNN)
We show that the NS-RNN achieves lower cross-entropy than all previous stack RNNs on five context-free language modeling tasks.
We also propose a restricted version of the NS-RNN that makes it practical to use for language modeling on natural language.
arXiv Detail & Related papers (2021-09-05T03:25:23Z) - Sequence-to-Sequence Learning with Latent Neural Grammars [12.624691611049341]
Sequence-to-sequence learning with neural networks has become the de facto standard for sequence prediction tasks.
While flexible and performant, these models often require large datasets for training and can fail spectacularly on benchmarks designed to test for compositional generalization.
This work explores an alternative, hierarchical approach to sequence-to-sequence learning with quasi-synchronous grammars.
arXiv Detail & Related papers (2021-09-02T17:58:08Z) - Unsupervised Learning of Explainable Parse Trees for Improved
Generalisation [15.576061447736057]
We propose an attention mechanism over Tree-LSTMs to learn more meaningful and explainable parse tree structures.
We also demonstrate the superior performance of our proposed model on natural language inference, semantic relatedness, and sentiment analysis tasks.
arXiv Detail & Related papers (2021-04-11T12:10:03Z) - Recursive Top-Down Production for Sentence Generation with Latent Trees [77.56794870399288]
We model the production property of context-free grammars for natural and synthetic languages.
We present a dynamic programming algorithm that marginalises over latent binary tree structures with $N$ leaves.
We also present experimental results on German-English translation on the Multi30k dataset.
arXiv Detail & Related papers (2020-10-09T17:47:16Z) - Recognizing Long Grammatical Sequences Using Recurrent Networks
Augmented With An External Differentiable Stack [73.48927855855219]
Recurrent neural networks (RNNs) are a widely used deep architecture for sequence modeling, generation, and prediction.
RNNs generalize poorly over very long sequences, which limits their applicability to many important temporal processing and time series forecasting problems.
One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack.
In this paper, we improve the memory-augmented RNN with important architectural and state updating mechanisms.
arXiv Detail & Related papers (2020-04-04T14:19:15Z) - Linguistically Driven Graph Capsule Network for Visual Question
Reasoning [153.76012414126643]
We propose a hierarchical compositional reasoning model called the "Linguistically driven Graph Capsule Network"
The compositional process is guided by the linguistic parse tree. Specifically, we bind each capsule in the lowest layer to bridge the linguistic embedding of a single word in the original question with visual evidence.
Experiments on the CLEVR dataset, CLEVR compositional generation test, and FigureQA dataset demonstrate the effectiveness and composition generalization ability of our end-to-end model.
arXiv Detail & Related papers (2020-03-23T03:34:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.