Morphological Inflection with Phonological Features
- URL: http://arxiv.org/abs/2306.12581v1
- Date: Wed, 21 Jun 2023 21:34:39 GMT
- Title: Morphological Inflection with Phonological Features
- Authors: David Guriel, Omer Goldman, Reut Tsarfaty
- Abstract summary: This work explores effects on performance obtained through various ways in which morphological models get access to subcharacter phonological features.
We elicit phonemic data from standard graphemic data using language-specific grammars for languages with shallow grapheme-to-phoneme mapping.
- Score: 7.245355976804435
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent years have brought great advances into solving morphological tasks,
mostly due to powerful neural models applied to various tasks as (re)inflection
and analysis. Yet, such morphological tasks cannot be considered solved,
especially when little training data is available or when generalizing to
previously unseen lemmas. This work explores effects on performance obtained
through various ways in which morphological models get access to subcharacter
phonological features that are the targets of morphological processes. We
design two methods to achieve this goal: one that leaves models as is but
manipulates the data to include features instead of characters, and another
that manipulates models to take phonological features into account when
building representations for phonemes. We elicit phonemic data from standard
graphemic data using language-specific grammars for languages with shallow
grapheme-to-phoneme mapping, and we experiment with two reinflection models
over eight languages. Our results show that our methods yield comparable
results to the grapheme-based baseline overall, with minor improvements in some
of the languages. All in all, we conclude that patterns in character
distributions are likely to allow models to infer the underlying phonological
characteristics, even when phonemes are not explicitly represented.
Related papers
- Small Language Models Like Small Vocabularies: Probing the Linguistic Abilities of Grapheme- and Phoneme-Based Baby Llamas [7.585433383340306]
We show that small models based on the Llama architecture can achieve strong linguistic performance on standard syntactic and novel lexical/phonetic benchmarks.
Our findings suggest a promising direction for creating more linguistically plausible language models that are better suited for computational studies of language acquisition and processing.
arXiv Detail & Related papers (2024-10-02T12:36:08Z) - UzMorphAnalyser: A Morphological Analysis Model for the Uzbek Language Using Inflectional Endings [0.0]
Affixes play an important role in the morphological analysis of words, by adding additional meanings and grammatical functions to words.
This paper present modeling of the morphological analysis of Uzbek words, including stemming, lemmatizing, and the extraction of morphological information.
The developed tool based on the proposed model is available as a web-based application and an open-source Python library.
arXiv Detail & Related papers (2024-05-23T05:06:55Z) - Learning Phonotactics from Linguistic Informants [54.086544221761486]
Our model iteratively selects or synthesizes a data-point according to one of a range of information-theoretic policies.
We find that the information-theoretic policies that our model uses to select items to query the informant achieve sample efficiency comparable to, or greater than, fully supervised approaches.
arXiv Detail & Related papers (2024-05-08T00:18:56Z) - Modeling Target-Side Morphology in Neural Machine Translation: A
Comparison of Strategies [72.56158036639707]
Morphologically rich languages pose difficulties to machine translation.
A large amount of differently inflected word surface forms entails a larger vocabulary.
Some inflected forms of infrequent terms typically do not appear in the training corpus.
Linguistic agreement requires the system to correctly match the grammatical categories between inflected word forms in the output sentence.
arXiv Detail & Related papers (2022-03-25T10:13:20Z) - On The Ingredients of an Effective Zero-shot Semantic Parser [95.01623036661468]
We analyze zero-shot learning by paraphrasing training examples of canonical utterances and programs from a grammar.
We propose bridging these gaps using improved grammars, stronger paraphrasers, and efficient learning methods.
Our model achieves strong performance on two semantic parsing benchmarks (Scholar, Geo) with zero labeled data.
arXiv Detail & Related papers (2021-10-15T21:41:16Z) - Minimal Supervision for Morphological Inflection [8.532288965425805]
We bootstrapping labeled data from a seed as little as em five labeled paradigms, accompanied by a large bulk of unlabeled text.
Our approach exploits different kinds of regularities in morphological systems in a two-phased setup.
We experiment with the Paradigm Cell Filling Problem over eight typologically different languages, and find that, in languages with relatively simple morphology, orthographic regularities on their own allow inflection models to achieve respectable accuracy.
arXiv Detail & Related papers (2021-04-17T11:07:36Z) - Morphological Disambiguation from Stemming Data [1.2183405753834562]
Kinyarwanda, a morphologically rich language, currently lacks tools for automated morphological analysis.
We learn to morphologically disambiguate Kinyarwanda verbal forms from a new stemming dataset collected through crowd-sourcing.
Our experiments reveal that inflectional properties of stems and morpheme association rules are the most discriminative features for disambiguation.
arXiv Detail & Related papers (2020-11-11T01:44:09Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - Data Augmentation for Spoken Language Understanding via Pretrained
Language Models [113.56329266325902]
Training of spoken language understanding (SLU) models often faces the problem of data scarcity.
We put forward a data augmentation method using pretrained language models to boost the variability and accuracy of generated utterances.
arXiv Detail & Related papers (2020-04-29T04:07:12Z) - A Simple Joint Model for Improved Contextual Neural Lemmatization [60.802451210656805]
We present a simple joint neural model for lemmatization and morphological tagging that achieves state-of-the-art results on 20 languages.
Our paper describes the model in addition to training and decoding procedures.
arXiv Detail & Related papers (2019-04-04T02:03:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.