Inflecting when there's no majority: Limitations of encoder-decoder
neural networks as cognitive models for German plurals
- URL: http://arxiv.org/abs/2005.08826v1
- Date: Mon, 18 May 2020 15:58:28 GMT
- Title: Inflecting when there's no majority: Limitations of encoder-decoder
neural networks as cognitive models for German plurals
- Authors: Kate McCurdy, Sharon Goldwater, Adam Lopez
- Abstract summary: Can artificial neural networks learn to represent inflectional morphology and generalize to new words as human speakers do?
We collect a new dataset from German speakers (production and ratings of plural forms for novel nouns) that is designed to avoid sources of information unavailable to the ED model.
We conclude that modern neural models may still struggle with minority-class generalization.
- Score: 27.002788405625484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Can artificial neural networks learn to represent inflectional morphology and
generalize to new words as human speakers do? Kirov and Cotterell (2018) argue
that the answer is yes: modern Encoder-Decoder (ED) architectures learn
human-like behavior when inflecting English verbs, such as extending the
regular past tense form -(e)d to novel words. However, their work does not
address the criticism raised by Marcus et al. (1995): that neural models may
learn to extend not the regular, but the most frequent class -- and thus fail
on tasks like German number inflection, where infrequent suffixes like -s can
still be productively generalized.
To investigate this question, we first collect a new dataset from German
speakers (production and ratings of plural forms for novel nouns) that is
designed to avoid sources of information unavailable to the ED model. The
speaker data show high variability, and two suffixes evince 'regular' behavior,
appearing more often with phonologically atypical inputs. Encoder-decoder
models do generalize the most frequently produced plural class, but do not show
human-like variability or 'regular' extension of these other plural markers. We
conclude that modern neural models may still struggle with minority-class
generalization.
Related papers
- Analyzing Finnish Inflectional Classes through Discriminative Lexicon and Deep Learning Models [42.045109659898465]
Inflectional classes bring together nouns which have similar stem changes and use similar exponents in their paradigms.<n>It is unclear whether inflectional classes are cognitively real.<n>This study uses a dataset with 55,271 inflected nouns of 2000 high-frequency Finnish nouns from 49 inflectional classes.
arXiv Detail & Related papers (2025-09-05T05:24:56Z) - Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers.
It is common to instead use proxy tasks that are similar in only an informal sense.
We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - SpeechAlign: Aligning Speech Generation to Human Preferences [51.684183257809075]
We introduce SpeechAlign, an iterative self-improvement strategy that aligns speech language models to human preferences.
We show that SpeechAlign can bridge the distribution gap and facilitate continuous self-improvement of the speech language model.
arXiv Detail & Related papers (2024-04-08T15:21:17Z) - How to Plant Trees in Language Models: Data and Architectural Effects on
the Emergence of Syntactic Inductive Biases [28.58785395946639]
We show that pre-training can teach language models to rely on hierarchical syntactic features when performing tasks after fine-tuning.
We focus on architectural features (depth, width, and number of parameters), as well as the genre and size of the pre-training corpus.
arXiv Detail & Related papers (2023-05-31T14:38:14Z) - How do we get there? Evaluating transformer neural networks as cognitive
models for English past tense inflection [0.0]
We train a set of transformer models with different settings to examine their behavior on this task.
The models' performance on the regulars is heavily affected by type frequency and ratio but not token frequency and ratio, and vice versa for the irregulars.
Although the transformer model exhibits some level of learning on the abstract category of verb regularity, its performance does not fit human data well.
arXiv Detail & Related papers (2022-10-17T15:13:35Z) - Twist Decoding: Diverse Generators Guide Each Other [116.20780037268801]
We introduce Twist decoding, a simple and general inference algorithm that generates text while benefiting from diverse models.
Our method does not assume the vocabulary, tokenization or even generation order is shared.
arXiv Detail & Related papers (2022-05-19T01:27:53Z) - Same Neurons, Different Languages: Probing Morphosyntax in Multilingual
Pre-trained Models [84.86942006830772]
We conjecture that multilingual pre-trained models can derive language-universal abstractions about grammar.
We conduct the first large-scale empirical study over 43 languages and 14 morphosyntactic categories with a state-of-the-art neuron-level probe.
arXiv Detail & Related papers (2022-05-04T12:22:31Z) - Computing Class Hierarchies from Classifiers [12.631679928202516]
We propose a novel algorithm for automatically acquiring a class hierarchy from a neural network.
Our algorithm produces surprisingly good hierarchies for some well-known deep neural network models.
arXiv Detail & Related papers (2021-12-02T13:01:04Z) - Falling Through the Gaps: Neural Architectures as Models of
Morphological Rule Learning [0.0]
We evaluate the Transformer as a model of morphological rule learning.
We compare it with Recurrent Neural Networks (RNN) on English, German, and Russian.
arXiv Detail & Related papers (2021-05-08T14:48:29Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z) - A Simple Joint Model for Improved Contextual Neural Lemmatization [60.802451210656805]
We present a simple joint neural model for lemmatization and morphological tagging that achieves state-of-the-art results on 20 languages.
Our paper describes the model in addition to training and decoding procedures.
arXiv Detail & Related papers (2019-04-04T02:03:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.