Related papers: How do we get there? Evaluating transformer neural networks as cognitive models for English past tense inflection

How do we get there? Evaluating transformer neural networks as cognitive models for English past tense inflection

URL: http://arxiv.org/abs/2210.09167v2
Date: Sat, 13 May 2023 21:01:37 GMT
Title: How do we get there? Evaluating transformer neural networks as cognitive models for English past tense inflection
Authors: Xiaomeng Ma and Lingyu Gao
Abstract summary: We train a set of transformer models with different settings to examine their behavior on this task. The models' performance on the regulars is heavily affected by type frequency and ratio but not token frequency and ratio, and vice versa for the irregulars. Although the transformer model exhibits some level of learning on the abstract category of verb regularity, its performance does not fit human data well.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: There is an ongoing debate on whether neural networks can grasp the quasi-regularities in languages like humans. In a typical quasi-regularity task, English past tense inflections, the neural network model has long been criticized that it learns only to generalize the most frequent pattern, but not the regular pattern, thus can not learn the abstract categories of regular and irregular and is dissimilar to human performance. In this work, we train a set of transformer models with different settings to examine their behavior on this task. The models achieved high accuracy on unseen regular verbs and some accuracy on unseen irregular verbs. The models' performance on the regulars is heavily affected by type frequency and ratio but not token frequency and ratio, and vice versa for the irregulars. The different behaviors on the regulars and irregulars suggest that the models have some degree of symbolic learning on the regularity of the verbs. In addition, the models are weakly correlated with human behavior on nonce verbs. Although the transformer model exhibits some level of learning on the abstract category of verb regularity, its performance does not fit human data well, suggesting that it might not be a good cognitive model.

Related papers

Analyzing Finnish Inflectional Classes through Discriminative Lexicon and Deep Learning Models [42.045109659898465]
Inflectional classes bring together nouns which have similar stem changes and use similar exponents in their paradigms.<n>It is unclear whether inflectional classes are cognitively real.<n>This study uses a dataset with 55,271 inflected nouns of 2000 high-frequency Finnish nouns from 49 inflectional classes.
arXiv Detail & Related papers (2025-09-05T05:24:56Z)
Evaluating the cognitive reality of Spanish irregular morphomic patterns: Humans vs. Transformers [0.8602553195689513]
This study investigates the cognitive plausibility of the Spanish irregular morphomic pattern.<n>Using the same analytical framework as the original human study, we evaluate whether transformer models can replicate human-like sensitivity to the morphome.
arXiv Detail & Related papers (2025-07-29T07:40:32Z)
Frequency matters: Modeling irregular morphological patterns in Spanish with Transformers [0.8602553195689513]
The present paper evaluates the learning behaviour of a transformer-based neural network with regard to an irregular inflectional paradigm. We train the model on a corpus of Spanish verbs to compare it with models trained on input with augmented distributions of (ir)regular words. Our experiments show that, across frequency conditions, the models are surprisingly capable of learning the irregular pattern.
arXiv Detail & Related papers (2024-10-28T13:36:46Z)
Testing learning hypotheses using neural networks by manipulating learning data [20.525923251193472]
We show that a neural network language model can learn restrictions to the passive that are similar to those displayed by humans. We find that while the frequency with which a verb appears in the passive significantly affects its passivizability, the semantics of the verb does not.
arXiv Detail & Related papers (2024-07-05T15:41:30Z)
Longer Fixations, More Computation: Gaze-Guided Recurrent Neural Networks [12.57650361978445]
Humans read texts at a varying pace, while machine learning models treat each token in the same way. In this paper, we convert this intuition into a set of novel models with fixation-guided parallel RNNs or layers. We find that, interestingly, the fixation duration predicted by neural networks bears some resemblance to humans' fixation.
arXiv Detail & Related papers (2023-10-31T21:32:11Z)
Training Trajectories of Language Models Across Scales [99.38721327771208]
Scaling up language models has led to unprecedented performance gains. How do language models of different sizes learn during pre-training? Why do larger language models demonstrate more desirable behaviors?
arXiv Detail & Related papers (2022-12-19T19:16:29Z)
Rarely a problem? Language models exhibit inverse scaling in their predictions following few-type quantifiers [0.6091702876917281]
We focus on 'few'-type quantifiers, as in 'few children like toys', which might pose a particular challenge for language models. We present 960 English sentence stimuli from two human neurolinguistic experiments to 22 autoregressive transformer models of differing sizes.
arXiv Detail & Related papers (2022-12-16T20:01:22Z)
Discovering Latent Knowledge in Language Models Without Supervision [72.95136739040676]
Existing techniques for training language models can be misaligned with the truth. We propose directly finding latent knowledge inside the internal activations of a language model in a purely unsupervised way. We show that despite using no supervision and no model outputs, our method can recover diverse knowledge represented in large language models.
arXiv Detail & Related papers (2022-12-07T18:17:56Z)
Falling Through the Gaps: Neural Architectures as Models of Morphological Rule Learning [0.0]
We evaluate the Transformer as a model of morphological rule learning. We compare it with Recurrent Neural Networks (RNN) on English, German, and Russian.
arXiv Detail & Related papers (2021-05-08T14:48:29Z)
Unnatural Language Inference [48.45003475966808]
We find that state-of-the-art NLI models, such as RoBERTa and BART, are invariant to, and sometimes even perform better on, examples with randomly reordered words. Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.
arXiv Detail & Related papers (2020-12-30T20:40:48Z)
Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long. We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay. Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z)
Mechanisms for Handling Nested Dependencies in Neural-Network Language Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing. Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement. We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z)
Exact Hard Monotonic Attention for Character-Level Transduction [76.66797368985453]
We show that neural sequence-to-sequence models that use non-monotonic soft attention often outperform popular monotonic models. We develop a hard attention sequence-to-sequence model that enforces strict monotonicity and learns a latent alignment jointly while learning to transduce.
arXiv Detail & Related papers (2019-05-15T17:51:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.