Word class representations spontaneously emerge in a deep neural network
trained on next word prediction
- URL: http://arxiv.org/abs/2302.07588v1
- Date: Wed, 15 Feb 2023 11:02:50 GMT
- Title: Word class representations spontaneously emerge in a deep neural network
trained on next word prediction
- Authors: Kishore Surendra, Achim Schilling, Paul Stoewer, Andreas Maier and
Patrick Krauss
- Abstract summary: How do humans learn language, and can the first language be learned at all?
These fundamental questions are still hotly debated.
In particular, we train an artificial deep neural network on predicting the next word.
We find that the internal representations of nine-word input sequences cluster according to the word class of the tenth word to be predicted as output.
- Score: 7.240611820374677
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: How do humans learn language, and can the first language be learned at all?
These fundamental questions are still hotly debated. In contemporary
linguistics, there are two major schools of thought that give completely
opposite answers. According to Chomsky's theory of universal grammar, language
cannot be learned because children are not exposed to sufficient data in their
linguistic environment. In contrast, usage-based models of language assume a
profound relationship between language structure and language use. In
particular, contextual mental processing and mental representations are assumed
to have the cognitive capacity to capture the complexity of actual language use
at all levels. The prime example is syntax, i.e., the rules by which words are
assembled into larger units such as sentences. Typically, syntactic rules are
expressed as sequences of word classes. However, it remains unclear whether
word classes are innate, as implied by universal grammar, or whether they
emerge during language acquisition, as suggested by usage-based approaches.
Here, we address this issue from a machine learning and natural language
processing perspective. In particular, we trained an artificial deep neural
network on predicting the next word, provided sequences of consecutive words as
input. Subsequently, we analyzed the emerging activation patterns in the hidden
layers of the neural network. Strikingly, we find that the internal
representations of nine-word input sequences cluster according to the word
class of the tenth word to be predicted as output, even though the neural
network did not receive any explicit information about syntactic rules or word
classes during training. This surprising result suggests, that also in the
human brain, abstract representational categories such as word classes may
naturally emerge as a consequence of predictive coding and processing during
language acquisition.
Related papers
- Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers.
It is common to instead use proxy tasks that are similar in only an informal sense.
We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z) - Why can neural language models solve next-word prediction? A
mathematical perspective [53.807657273043446]
We study a class of formal languages that can be used to model real-world examples of English sentences.
Our proof highlights the different roles of the embedding layer and the fully connected component within the neural language model.
arXiv Detail & Related papers (2023-06-20T10:41:23Z) - Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks [8.683116789109462]
We focus on one of the most ubiquitous and elementary suboperation of syntax -- concatenation.
We introduce spontaneous concatenation: a phenomenon where convolutional neural networks (CNNs) trained on acoustic recordings of individual words start generating outputs.
We also propose a potential neural mechanism called disinhibition that outlines a possible neural pathway towards concatenation and compositionality.
arXiv Detail & Related papers (2023-05-02T17:38:21Z) - Deep Learning Models to Study Sentence Comprehension in the Human Brain [0.1503974529275767]
Recent artificial neural networks that process natural language achieve unprecedented performance in tasks requiring sentence-level understanding.
We review works that compare these artificial language models with human brain activity and we assess the extent to which this approach has improved our understanding of the neural processes involved in natural language comprehension.
arXiv Detail & Related papers (2023-01-16T10:31:25Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Same Neurons, Different Languages: Probing Morphosyntax in Multilingual
Pre-trained Models [84.86942006830772]
We conjecture that multilingual pre-trained models can derive language-universal abstractions about grammar.
We conduct the first large-scale empirical study over 43 languages and 14 morphosyntactic categories with a state-of-the-art neuron-level probe.
arXiv Detail & Related papers (2022-05-04T12:22:31Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.