Related papers: Do RNN States Encode Abstract Phonological Processes?

Do RNN States Encode Abstract Phonological Processes?

URL: http://arxiv.org/abs/2104.00789v1
Date: Thu, 1 Apr 2021 22:24:39 GMT
Title: Do RNN States Encode Abstract Phonological Processes?
Authors: Miikka Silfverberg, Francis Tyers, Garrett Nicolai, Mans Hulden
Abstract summary: We show that Sequence-to-sequence models often encode 17 different consonant gradation processes in a handful of dimensions in the RNN. We also show that by scaling the activations in these dimensions we can control whether consonant gradation occurs and the direction of the gradation.
Score: 9.148410930089502
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sequence-to-sequence models have delivered impressive results in word formation tasks such as morphological inflection, often learning to model subtle morphophonological details with limited training data. Despite the performance, the opacity of neural models makes it difficult to determine whether complex generalizations are learned, or whether a kind of separate rote memorization of each morphophonological process takes place. To investigate whether complex alternations are simply memorized or whether there is some level of generalization across related sound changes in a sequence-to-sequence model, we perform several experiments on Finnish consonant gradation -- a complex set of sound changes triggered in some words by certain suffixes. We find that our models often -- though not always -- encode 17 different consonant gradation processes in a handful of dimensions in the RNN. We also show that by scaling the activations in these dimensions we can control whether consonant gradation occurs and the direction of the gradation.

Related papers

Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling [60.63703438729223]
We show how different architectures and training methods affect model multi-step reasoning capabilities.<n>We confirm that increasing model depth plays a crucial role for sequential computations.
arXiv Detail & Related papers (2025-08-22T18:57:08Z)
Sometimes I am a Tree: Data Drives Unstable Hierarchical Generalization [15.028208772567487]
We use case studies of English grammar to explore how complex, diverse training data drives models to generalize OOD. We show that these factors are nuanced and that intermediate levels of diversity and complexity lead to inconsistent behavior across random seeds. Our findings emphasize the critical role of training data in shaping generalization patterns and illuminate how competing model strategies lead to inconsistent generalization outcomes across random seeds.
arXiv Detail & Related papers (2024-12-05T21:12:37Z)
Frequency matters: Modeling irregular morphological patterns in Spanish with Transformers [0.8602553195689513]
We focus on Spanish verbal paradigms, where certain verbs follow an irregular L-shaped pattern. We investigate the role of input frequency in the acquisition of regular versus irregular L-shaped patterns in transformer models.
arXiv Detail & Related papers (2024-10-28T13:36:46Z)
Demystifying Verbatim Memorization in Large Language Models [67.49068128909349]
Large Language Models (LLMs) frequently memorize long sequences verbatim, often with serious legal and privacy implications. We develop a framework to study verbatim memorization in a controlled setting by continuing pre-training from Pythia checkpoints with injected sequences. We find that (1) non-trivial amounts of repetition are necessary for verbatim memorization to happen; (2) later (and presumably better) checkpoints are more likely to memorize verbatim sequences, even for out-of-distribution sequences.
arXiv Detail & Related papers (2024-07-25T07:10:31Z)
From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty [67.81977289444677]
Large language models (LLMs) often exhibit undesirable behaviors, such as hallucinations and sequence repetitions. We categorize fallback behaviors - sequence repetitions, degenerate text, and hallucinations - and extensively analyze them. Our experiments reveal a clear and consistent ordering of fallback behaviors, across all these axes.
arXiv Detail & Related papers (2024-07-08T16:13:42Z)
Morphological Inflection with Phonological Features [7.245355976804435]
This work explores effects on performance obtained through various ways in which morphological models get access to subcharacter phonological features. We elicit phonemic data from standard graphemic data using language-specific grammars for languages with shallow grapheme-to-phoneme mapping.
arXiv Detail & Related papers (2023-06-21T21:34:39Z)
Memorization-Dilation: Modeling Neural Collapse Under Label Noise [10.134749691813344]
During the terminal phase of training a deep neural network, the feature embedding of all examples of the same class tend to collapse to a single representation. Empirical evidence suggests that the memorization of noisy data points leads to a degradation (dilation) of the neural collapse. Our proofs reveal why label smoothing, a modification of cross-entropy empirically observed to produce a regularization effect, leads to improved generalization in classification tasks.
arXiv Detail & Related papers (2022-06-11T13:40:37Z)
Can a Transformer Pass the Wug Test? Tuning Copying Bias in Neural Morphological Inflection Models [9.95909045828344]
We show that, to be more effective, the hallucination process needs to pay attention to syllable-like length rather than individual characters or stems. We report a significant performance improvement with our hallucination model over previous data hallucination methods when training and test data do not overlap in their lemmata.
arXiv Detail & Related papers (2021-04-13T19:51:21Z)
Unnatural Language Inference [48.45003475966808]
We find that state-of-the-art NLI models, such as RoBERTa and BART, are invariant to, and sometimes even perform better on, examples with randomly reordered words. Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.
arXiv Detail & Related papers (2020-12-30T20:40:48Z)
Modelling Verbal Morphology in Nen [4.6877729174041605]
We use state-of-the-art machine learning models for morphological reinflection to model Nen verbal morphology. Our results show sensitivity to training data composition; different distributions of verb type yield different accuracies. We also demonstrate the types of patterns that can be inferred from the training data through the case study of syncretism.
arXiv Detail & Related papers (2020-11-30T01:22:05Z)
Neural Complexity Measures [96.06344259626127]
We propose Neural Complexity (NC), a meta-learning framework for predicting generalization. Our model learns a scalar complexity measure through interactions with many heterogeneous tasks in a data-driven way.
arXiv Detail & Related papers (2020-08-07T02:12:10Z)
Mechanisms for Handling Nested Dependencies in Neural-Network Language Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing. Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement. We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z)
Do Neural Models Learn Systematicity of Monotonicity Inference in Natural Language? [41.649440404203595]
We introduce a method for evaluating whether neural models can learn systematicity of monotonicity inference in natural language. We consider four aspects of monotonicity inferences and test whether the models can systematically interpret lexical and logical phenomena on different training/test splits.
arXiv Detail & Related papers (2020-04-30T14:48:39Z)
Consistency of a Recurrent Language Model With Respect to Incomplete Decoding [67.54760086239514]
We study the issue of receiving infinite-length sequences from a recurrent language model. We propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model.
arXiv Detail & Related papers (2020-02-06T19:56:15Z)
A Simple Joint Model for Improved Contextual Neural Lemmatization [60.802451210656805]
We present a simple joint neural model for lemmatization and morphological tagging that achieves state-of-the-art results on 20 languages. Our paper describes the model in addition to training and decoding procedures.
arXiv Detail & Related papers (2019-04-04T02:03:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.