Frequency matters: Modeling irregular morphological patterns in Spanish with Transformers
- URL: http://arxiv.org/abs/2410.21013v4
- Date: Tue, 27 May 2025 15:48:35 GMT
- Title: Frequency matters: Modeling irregular morphological patterns in Spanish with Transformers
- Authors: Akhilesh Kakolu Ramarao, Kevin Tang, Dinah Baer-Henney,
- Abstract summary: We focus on Spanish verbal paradigms, where certain verbs follow an irregular L-shaped pattern.<n>We investigate the role of input frequency in the acquisition of regular versus irregular L-shaped patterns in transformer models.
- Score: 0.8602553195689513
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Over the past decade, various studies have addressed how speakers solve the so-called `The Paradigm Cell Filling Problem' (PCFP) \citep{ackerman2009parts} across different languages. The PCFP addresses a fundamental question in morphological processing: how do speakers accurately generate inflected forms of words when presented with incomplete paradigms? This problem is particularly salient when modeling complex inflectional systems. We focus on Spanish verbal paradigms, where certain verbs follow an irregular L-shaped pattern, where the first-person singular present indicative stem matches the stem used throughout the present subjunctive mood. We formulate the problem as a morphological reinflection task. Specifically, we investigate the role of input frequency in the acquisition of regular versus irregular L-shaped patterns in transformer models. By systematically manipulating the input distributions and analyzing model behavior, we reveal four key findings: 1) Models perform better on L-shaped verbs compared to regular verbs, especially in uneven frequency conditions; 2) Robust primacy effects are observed, but no consistent recency effects; 3) Memorization becomes more prominent as the proportion of L-shaped verbs increases; 4) There is a tendency to regularize L-shaped verbs when their consonant alternation pairs are rare or absent in the training data.
Related papers
- Character-aware Transformers Learn an Irregular Morphological Pattern Yet None Generalize Like Humans [8.033684021402165]
We show that encoder-decoder models can acquire irregular patterns, but evidence that they generalize these patterns like humans is mixed.<n>We investigate this using the Spanish emphL-shaped morphome, where only the first-person singular indicative shares its stem with all subjunctive forms.<n>None of the models reproduce the human pattern, highlighting the gap between statistical pattern reproduction and morphological abstraction.
arXiv Detail & Related papers (2026-02-15T11:22:12Z) - Emergent morpho-phonological representations in self-supervised speech models [3.9374885962486172]
We study how S3M variants optimized for word recognition represent phonological and morphological phenomena.<n>We find that their representations exhibit a global linear geometry which can be used to link English nouns and verbs to their regular inflected forms.
arXiv Detail & Related papers (2025-09-26T22:16:35Z) - Evaluating the cognitive reality of Spanish irregular morphomic patterns: Humans vs. Transformers [0.8602553195689513]
This study investigates the cognitive plausibility of the Spanish irregular morphomic pattern.<n>Using the same analytical framework as the original human study, we evaluate whether transformer models can replicate human-like sensitivity to the morphome.
arXiv Detail & Related papers (2025-07-29T07:40:32Z) - Beyond Early-Token Bias: Model-Specific and Language-Specific Position Effects in Multilingual LLMs [50.07451351559251]
We present a study across five typologically distinct languages (English, Russian, German, Hindi, and Vietnamese)<n>We examine how position bias interacts with prompt strategies and affects output entropy.
arXiv Detail & Related papers (2025-05-22T02:23:00Z) - Semantics drives analogical change in Germanic strong verb paradigms: a phylogenetic study [45.11082946405984]
In some Germanic languages, there is a greater affinity for stem allomorphy shared by preterite forms and past participles to the exclusion of present forms.
We show that there is a greater long-term preference for this alternation pattern in situations where narrative past tense has been extended to the past participle.
arXiv Detail & Related papers (2025-02-24T21:36:15Z) - Developmental Predictive Coding Model for Early Infancy Mono and Bilingual Vocal Continual Learning [69.8008228833895]
We propose a small-sized generative neural network equipped with a continual learning mechanism.
Our model prioritizes interpretability and demonstrates the advantages of online learning.
arXiv Detail & Related papers (2024-12-23T10:23:47Z) - Demystifying Verbatim Memorization in Large Language Models [67.49068128909349]
Large Language Models (LLMs) frequently memorize long sequences verbatim, often with serious legal and privacy implications.
We develop a framework to study verbatim memorization in a controlled setting by continuing pre-training from Pythia checkpoints with injected sequences.
We find that (1) non-trivial amounts of repetition are necessary for verbatim memorization to happen; (2) later (and presumably better) checkpoints are more likely to memorize verbatim sequences, even for out-of-distribution sequences.
arXiv Detail & Related papers (2024-07-25T07:10:31Z) - Testing learning hypotheses using neural networks by manipulating learning data [20.525923251193472]
We show that a neural network language model can learn restrictions to the passive that are similar to those displayed by humans.
We find that while the frequency with which a verb appears in the passive significantly affects its passivizability, the semantics of the verb does not.
arXiv Detail & Related papers (2024-07-05T15:41:30Z) - MiLe Loss: a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models [40.992566245706996]
We propose a MiLe Loss function for mitigating the bias of learning difficulties with tokens.
We train generative language models at different scales of 468M, 1.2B, and 6.7B parameters.
Experiments reveal that models incorporating the proposed MiLe Loss can gain consistent performance improvement on downstream benchmarks.
arXiv Detail & Related papers (2023-10-30T13:33:21Z) - Morphological Inflection with Phonological Features [7.245355976804435]
This work explores effects on performance obtained through various ways in which morphological models get access to subcharacter phonological features.
We elicit phonemic data from standard graphemic data using language-specific grammars for languages with shallow grapheme-to-phoneme mapping.
arXiv Detail & Related papers (2023-06-21T21:34:39Z) - Hierarchical Phrase-based Sequence-to-Sequence Learning [94.10257313923478]
We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference.
Our approach trains two models: a discriminative derivation based on a bracketing grammar whose tree hierarchically aligns source and target phrases, and a neural seq2seq model that learns to translate the aligned phrases one-by-one.
arXiv Detail & Related papers (2022-11-15T05:22:40Z) - How do we get there? Evaluating transformer neural networks as cognitive
models for English past tense inflection [0.0]
We train a set of transformer models with different settings to examine their behavior on this task.
The models' performance on the regulars is heavily affected by type frequency and ratio but not token frequency and ratio, and vice versa for the irregulars.
Although the transformer model exhibits some level of learning on the abstract category of verb regularity, its performance does not fit human data well.
arXiv Detail & Related papers (2022-10-17T15:13:35Z) - Modeling Target-Side Morphology in Neural Machine Translation: A
Comparison of Strategies [72.56158036639707]
Morphologically rich languages pose difficulties to machine translation.
A large amount of differently inflected word surface forms entails a larger vocabulary.
Some inflected forms of infrequent terms typically do not appear in the training corpus.
Linguistic agreement requires the system to correctly match the grammatical categories between inflected word forms in the output sentence.
arXiv Detail & Related papers (2022-03-25T10:13:20Z) - Falling Through the Gaps: Neural Architectures as Models of
Morphological Rule Learning [0.0]
We evaluate the Transformer as a model of morphological rule learning.
We compare it with Recurrent Neural Networks (RNN) on English, German, and Russian.
arXiv Detail & Related papers (2021-05-08T14:48:29Z) - Can a Transformer Pass the Wug Test? Tuning Copying Bias in Neural
Morphological Inflection Models [9.95909045828344]
We show that, to be more effective, the hallucination process needs to pay attention to syllable-like length rather than individual characters or stems.
We report a significant performance improvement with our hallucination model over previous data hallucination methods when training and test data do not overlap in their lemmata.
arXiv Detail & Related papers (2021-04-13T19:51:21Z) - Do RNN States Encode Abstract Phonological Processes? [9.148410930089502]
We show that Sequence-to-sequence models often encode 17 different consonant gradation processes in a handful of dimensions in the RNN.
We also show that by scaling the activations in these dimensions we can control whether consonant gradation occurs and the direction of the gradation.
arXiv Detail & Related papers (2021-04-01T22:24:39Z) - Unnatural Language Inference [48.45003475966808]
We find that state-of-the-art NLI models, such as RoBERTa and BART, are invariant to, and sometimes even perform better on, examples with randomly reordered words.
Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.
arXiv Detail & Related papers (2020-12-30T20:40:48Z) - Investigating Cross-Linguistic Adjective Ordering Tendencies with a
Latent-Variable Model [66.84264870118723]
We present the first purely corpus-driven model of multi-lingual adjective ordering in the form of a latent-variable model.
We provide strong converging evidence for the existence of universal, cross-linguistic, hierarchical adjective ordering tendencies.
arXiv Detail & Related papers (2020-10-09T18:27:55Z) - Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long.
We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay.
Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z) - Consistency of a Recurrent Language Model With Respect to Incomplete
Decoding [67.54760086239514]
We study the issue of receiving infinite-length sequences from a recurrent language model.
We propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model.
arXiv Detail & Related papers (2020-02-06T19:56:15Z) - A Simple Joint Model for Improved Contextual Neural Lemmatization [60.802451210656805]
We present a simple joint neural model for lemmatization and morphological tagging that achieves state-of-the-art results on 20 languages.
Our paper describes the model in addition to training and decoding procedures.
arXiv Detail & Related papers (2019-04-04T02:03:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.