Modeling rapid language learning by distilling Bayesian priors into
artificial neural networks
- URL: http://arxiv.org/abs/2305.14701v1
- Date: Wed, 24 May 2023 04:11:59 GMT
- Title: Modeling rapid language learning by distilling Bayesian priors into
artificial neural networks
- Authors: R. Thomas McCoy and Thomas L. Griffiths
- Abstract summary: We show that learning from limited naturalistic data is possible with an approach that combines the strong inductive biases of a Bayesian model with the flexible representations of a neural network.
The resulting system can learn formal linguistic patterns from a small number of examples.
It can also learn aspects of English syntax from a corpus of natural language.
- Score: 18.752638142258668
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans can learn languages from remarkably little experience. Developing
computational models that explain this ability has been a major challenge in
cognitive science. Bayesian models that build in strong inductive biases -
factors that guide generalization - have been successful at explaining how
humans might generalize from few examples in controlled settings but are
usually too restrictive to be tractably applied to more naturalistic data. By
contrast, neural networks have flexible representations that allow them to
learn well from naturalistic data but require many more examples than humans
receive. We show that learning from limited naturalistic data is possible with
an approach that combines the strong inductive biases of a Bayesian model with
the flexible representations of a neural network. This approach works by
distilling a Bayesian model's biases into a neural network. Like a Bayesian
model, the resulting system can learn formal linguistic patterns from a small
number of examples. Like a neural network, it can also learn aspects of English
syntax from a corpus of natural language - and it outperforms a standard neural
network at acquiring the linguistic phenomena of recursion and priming.
Bridging the divide between Bayesian models and neural networks makes it
possible to handle a broader range of learning scenarios than either approach
can handle on its own.
Related papers
- Distilling Symbolic Priors for Concept Learning into Neural Networks [9.915299875869046]
We show that inductive biases can be instantiated in artificial neural networks by distilling a prior distribution from a symbolic Bayesian model via meta-learning.
We use this approach to create a neural network with an inductive bias towards concepts expressed as short logical formulas.
arXiv Detail & Related papers (2024-02-10T20:06:26Z) - Why can neural language models solve next-word prediction? A
mathematical perspective [53.807657273043446]
We study a class of formal languages that can be used to model real-world examples of English sentences.
Our proof highlights the different roles of the embedding layer and the fully connected component within the neural language model.
arXiv Detail & Related papers (2023-06-20T10:41:23Z) - What makes a language easy to deep-learn? Deep neural networks and humans similarly benefit from compositional structure [5.871583927216651]
A fundamental property of language is its compositional structure, allowing humans to produce forms for new meanings.
For humans, languages with more compositional and transparent structures are typically easier to learn than those with opaque and irregular structures.
This learnability advantage has not yet been shown for deep neural networks, limiting their use as models for human language learning.
arXiv Detail & Related papers (2023-02-23T18:57:34Z) - Benchmarking Compositionality with Formal Languages [64.09083307778951]
We investigate whether large neural models in NLP can acquire the ability tocombining primitive concepts into larger novel combinations while learning from data.
By randomly sampling over many transducers, we explore which of their properties contribute to learnability of a compositional relation by a neural network.
We find that the models either learn the relations completely or not at all. The key is transition coverage, setting a soft learnability limit at 400 examples per transition.
arXiv Detail & Related papers (2022-08-17T10:03:18Z) - What Artificial Neural Networks Can Tell Us About Human Language
Acquisition [47.761188531404066]
Rapid progress in machine learning for natural language processing has the potential to transform debates about how humans learn language.
To increase the relevance of learnability results from computational models, we need to train model learners without significant advantages over humans.
arXiv Detail & Related papers (2022-08-17T00:12:37Z) - Is neural language acquisition similar to natural? A chronological
probing study [0.0515648410037406]
We present the chronological probing study of transformer English models such as MultiBERT and T5.
We compare the information about the language learned by the models in the process of training on corpora.
The results show that 1) linguistic information is acquired in the early stages of training 2) both language models demonstrate capabilities to capture various features from various levels of language.
arXiv Detail & Related papers (2022-07-01T17:24:11Z) - Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models.
In detail, we first train neural language models with a novel dependency modeling objective.
We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z) - The Neural Coding Framework for Learning Generative Models [91.0357317238509]
We propose a novel neural generative model inspired by the theory of predictive processing in the brain.
In a similar way, artificial neurons in our generative model predict what neighboring neurons will do, and adjust their parameters based on how well the predictions matched reality.
arXiv Detail & Related papers (2020-12-07T01:20:38Z) - Reservoir Memory Machines as Neural Computers [70.5993855765376]
Differentiable neural computers extend artificial neural networks with an explicit memory without interference.
We achieve some of the computational capabilities of differentiable neural computers with a model that can be trained very efficiently.
arXiv Detail & Related papers (2020-09-14T12:01:30Z) - Universal linguistic inductive biases via meta-learning [36.43388942327124]
It is unclear which inductive biases can explain observed patterns in language acquisition.
We introduce a framework for giving linguistic inductive biases to a neural network model.
We demonstrate this framework with a case study based on syllable structure.
arXiv Detail & Related papers (2020-06-29T19:15:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.