How poor is the stimulus? Evaluating hierarchical generalization in
neural networks trained on child-directed speech
- URL: http://arxiv.org/abs/2301.11462v2
- Date: Tue, 6 Jun 2023 13:40:22 GMT
- Title: How poor is the stimulus? Evaluating hierarchical generalization in
neural networks trained on child-directed speech
- Authors: Aditya Yedetore, Tal Linzen, Robert Frank, R. Thomas McCoy
- Abstract summary: We train LSTMs and Transformers on data similar in quantity and content to children's linguistic input: text from the CHILDES corpus.
We find that both model types generalize in a way more consistent with an incorrect linear rule than the correct hierarchical rule.
These results suggest that human-like generalization from text alone requires stronger biases than the general sequence-processing biases of standard neural network architectures.
- Score: 25.02822854434971
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When acquiring syntax, children consistently choose hierarchical rules over
competing non-hierarchical possibilities. Is this preference due to a learning
bias for hierarchical structure, or due to more general biases that interact
with hierarchical cues in children's linguistic input? We explore these
possibilities by training LSTMs and Transformers - two types of neural networks
without a hierarchical bias - on data similar in quantity and content to
children's linguistic input: text from the CHILDES corpus. We then evaluate
what these models have learned about English yes/no questions, a phenomenon for
which hierarchical structure is crucial. We find that, though they perform well
at capturing the surface statistics of child-directed speech (as measured by
perplexity), both model types generalize in a way more consistent with an
incorrect linear rule than the correct hierarchical rule. These results suggest
that human-like generalization from text alone requires stronger biases than
the general sequence-processing biases of standard neural network
architectures.
Related papers
- Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically [74.96551626420188]
Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures.
We investigate sources of inductive bias in transformer models and their training that could cause such generalization behavior to emerge.
arXiv Detail & Related papers (2024-04-25T07:10:29Z) - How to Plant Trees in Language Models: Data and Architectural Effects on
the Emergence of Syntactic Inductive Biases [28.58785395946639]
We show that pre-training can teach language models to rely on hierarchical syntactic features when performing tasks after fine-tuning.
We focus on architectural features (depth, width, and number of parameters), as well as the genre and size of the pre-training corpus.
arXiv Detail & Related papers (2023-05-31T14:38:14Z) - A Multi-Grained Self-Interpretable Symbolic-Neural Model For
Single/Multi-Labeled Text Classification [29.075766631810595]
We propose a Symbolic-Neural model that can learn to explicitly predict class labels of text spans from a constituency tree.
As the structured language model learns to predict constituency trees in a self-supervised manner, only raw texts and sentence-level labels are required as training data.
Our experiments demonstrate that our approach could achieve good prediction accuracy in downstream tasks.
arXiv Detail & Related papers (2023-03-06T03:25:43Z) - Encoding Hierarchical Information in Neural Networks helps in
Subpopulation Shift [8.01009207457926]
Deep neural networks have proven to be adept in image classification tasks, often surpassing humans in terms of accuracy.
In this work, we study the aforementioned problems through the lens of a novel conditional supervised training framework.
We show that learning in this structured hierarchical manner results in networks that are more robust against subpopulation shifts.
arXiv Detail & Related papers (2021-12-20T20:26:26Z) - Computing Class Hierarchies from Classifiers [12.631679928202516]
We propose a novel algorithm for automatically acquiring a class hierarchy from a neural network.
Our algorithm produces surprisingly good hierarchies for some well-known deep neural network models.
arXiv Detail & Related papers (2021-12-02T13:01:04Z) - TopicNet: Semantic Graph-Guided Topic Discovery [51.71374479354178]
Existing deep hierarchical topic models are able to extract semantically meaningful topics from a text corpus in an unsupervised manner.
We introduce TopicNet as a deep hierarchical topic model that can inject prior structural knowledge as an inductive bias to influence learning.
arXiv Detail & Related papers (2021-10-27T09:07:14Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Learning from Failure: Training Debiased Classifier from Biased
Classifier [76.52804102765931]
We show that neural networks learn to rely on spurious correlation only when it is "easier" to learn than the desired knowledge.
We propose a failure-based debiasing scheme by training a pair of neural networks simultaneously.
Our method significantly improves the training of the network against various types of biases in both synthetic and real-world datasets.
arXiv Detail & Related papers (2020-07-06T07:20:29Z) - Text Classification with Few Examples using Controlled Generalization [58.971750512415134]
Current practice relies on pre-trained word embeddings to map words unseen in training to similar seen ones.
Our alternative begins with sparse pre-trained representations derived from unlabeled parsed corpora.
We show that a feed-forward network over these vectors is especially effective in low-data scenarios.
arXiv Detail & Related papers (2020-05-18T06:04:58Z) - A Benchmark for Systematic Generalization in Grounded Language
Understanding [61.432407738682635]
Humans easily interpret expressions that describe unfamiliar situations composed from familiar parts.
Modern neural networks, by contrast, struggle to interpret novel compositions.
We introduce a new benchmark, gSCAN, for evaluating compositional generalization in situated language understanding.
arXiv Detail & Related papers (2020-03-11T08:40:15Z) - Does syntax need to grow on trees? Sources of hierarchical inductive
bias in sequence-to-sequence networks [28.129220683169052]
In neural network models, inductive biases could in theory arise from any aspect of the model architecture.
We investigate which architectural factors affect the generalization behavior of neural sequence-to-sequence models trained on two syntactic tasks.
arXiv Detail & Related papers (2020-01-10T19:02:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.