Character-aware Transformers Learn an Irregular Morphological Pattern Yet None Generalize Like Humans
- URL: http://arxiv.org/abs/2602.14100v1
- Date: Sun, 15 Feb 2026 11:22:12 GMT
- Title: Character-aware Transformers Learn an Irregular Morphological Pattern Yet None Generalize Like Humans
- Authors: Akhilesh Kakolu Ramarao, Kevin Tang, Dinah Baer-Henney,
- Abstract summary: We show that encoder-decoder models can acquire irregular patterns, but evidence that they generalize these patterns like humans is mixed.<n>We investigate this using the Spanish emphL-shaped morphome, where only the first-person singular indicative shares its stem with all subjunctive forms.<n>None of the models reproduce the human pattern, highlighting the gap between statistical pattern reproduction and morphological abstraction.
- Score: 8.033684021402165
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Whether neural networks can serve as cognitive models of morphological learning remains an open question. Recent work has shown that encoder-decoder models can acquire irregular patterns, but evidence that they generalize these patterns like humans is mixed. We investigate this using the Spanish \emph{L-shaped morphome}, where only the first-person singular indicative (e.g., \textit{pongo} `I put') shares its stem with all subjunctive forms (e.g., \textit{ponga, pongas}) despite lacking apparent phonological, semantic, or syntactic motivation. We compare five encoder-decoder transformers varying along two dimensions: sequential vs. position-invariant positional encoding, and atomic vs. decomposed tag representations. Positional encoding proves decisive: position-invariant models recover the correct L-shaped paradigm clustering even when L-shaped verbs are scarce in training, whereas sequential positional encoding models only partially capture the pattern. Yet none of the models productively generalize this pattern to novel forms. Position-invariant models generalize the L-shaped stem across subjunctive cells but fail to extend it to the first-person singular indicative, producing a mood-based generalization rather than the L-shaped morphomic pattern. Humans do the opposite, generalizing preferentially to the first-person singular indicative over subjunctive forms. None of the models reproduce the human pattern, highlighting the gap between statistical pattern reproduction and morphological abstraction.
Related papers
- A theoretical model of dynamical grammatical gender shifting based on set-valued set function [0.0]
This study investigates the diverse characteristics of nouns, focusing on both semantic (e.g., countable/uncountable) and morphosyntactic (e.g., masculine/feminine) distinctions.
arXiv Detail & Related papers (2026-03-03T20:32:13Z) - Frequency matters: Modeling irregular morphological patterns in Spanish with Transformers [0.8602553195689513]
We focus on Spanish verbal paradigms, where certain verbs follow an irregular L-shaped pattern.<n>We investigate the role of input frequency in the acquisition of regular versus irregular L-shaped patterns in transformer models.
arXiv Detail & Related papers (2024-10-28T13:36:46Z) - On the Origins of Linear Representations in Large Language Models [51.88404605700344]
We introduce a simple latent variable model to formalize the concept dynamics of the next token prediction.
Experiments show that linear representations emerge when learning from data matching the latent variable model.
We additionally confirm some predictions of the theory using the LLaMA-2 large language model.
arXiv Detail & Related papers (2024-03-06T17:17:36Z) - Morphological Inflection with Phonological Features [7.245355976804435]
This work explores effects on performance obtained through various ways in which morphological models get access to subcharacter phonological features.
We elicit phonemic data from standard graphemic data using language-specific grammars for languages with shallow grapheme-to-phoneme mapping.
arXiv Detail & Related papers (2023-06-21T21:34:39Z) - How do we get there? Evaluating transformer neural networks as cognitive
models for English past tense inflection [0.0]
We train a set of transformer models with different settings to examine their behavior on this task.
The models' performance on the regulars is heavily affected by type frequency and ratio but not token frequency and ratio, and vice versa for the irregulars.
Although the transformer model exhibits some level of learning on the abstract category of verb regularity, its performance does not fit human data well.
arXiv Detail & Related papers (2022-10-17T15:13:35Z) - Augmenting Implicit Neural Shape Representations with Explicit
Deformation Fields [95.39603371087921]
Implicit neural representation is a recent approach to learn shape collections as zero level-sets of neural networks.
We advocate deformation-aware regularization for implicit neural representations, aiming at producing plausible deformations as latent code changes.
arXiv Detail & Related papers (2021-08-19T22:07:08Z) - SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural
Implicit Shapes [117.76767853430243]
We introduce SNARF, which combines the advantages of linear blend skinning for polygonal meshes with neural implicit surfaces.
We propose a forward skinning model that finds all canonical correspondences of any deformed point using iterative root finding.
Compared to state-of-the-art neural implicit representations, our approach generalizes better to unseen poses while preserving accuracy.
arXiv Detail & Related papers (2021-04-08T17:54:59Z) - Unnatural Language Inference [48.45003475966808]
We find that state-of-the-art NLI models, such as RoBERTa and BART, are invariant to, and sometimes even perform better on, examples with randomly reordered words.
Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.
arXiv Detail & Related papers (2020-12-30T20:40:48Z) - Compositional Generalization via Semantic Tagging [81.24269148865555]
We propose a new decoding framework that preserves the expressivity and generality of sequence-to-sequence models.
We show that the proposed approach consistently improves compositional generalization across model architectures, domains, and semantic formalisms.
arXiv Detail & Related papers (2020-10-22T15:55:15Z) - A Simple Joint Model for Improved Contextual Neural Lemmatization [60.802451210656805]
We present a simple joint neural model for lemmatization and morphological tagging that achieves state-of-the-art results on 20 languages.
Our paper describes the model in addition to training and decoding procedures.
arXiv Detail & Related papers (2019-04-04T02:03:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.