Minimal Supervision for Morphological Inflection
- URL: http://arxiv.org/abs/2104.08512v1
- Date: Sat, 17 Apr 2021 11:07:36 GMT
- Title: Minimal Supervision for Morphological Inflection
- Authors: Omer Goldman and Reut Tsarfaty
- Abstract summary: We bootstrapping labeled data from a seed as little as em five labeled paradigms, accompanied by a large bulk of unlabeled text.
Our approach exploits different kinds of regularities in morphological systems in a two-phased setup.
We experiment with the Paradigm Cell Filling Problem over eight typologically different languages, and find that, in languages with relatively simple morphology, orthographic regularities on their own allow inflection models to achieve respectable accuracy.
- Score: 8.532288965425805
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Neural models for the various flavours of morphological inflection tasks have
proven to be extremely accurate given ample labeled data -- data that may be
slow and costly to obtain. In this work we aim to overcome this annotation
bottleneck by bootstrapping labeled data from a seed as little as {\em five}
labeled paradigms, accompanied by a large bulk of unlabeled text. Our approach
exploits different kinds of regularities in morphological systems in a
two-phased setup, where word tagging based on {\em analogies} is followed by
word pairing based on {\em distances}. We experiment with the Paradigm Cell
Filling Problem over eight typologically different languages, and find that, in
languages with relatively simple morphology, orthographic regularities on their
own allow inflection models to achieve respectable accuracy. Combined
orthographic and semantic regularities alleviate difficulties with particularly
complex morpho-phonological systems. Our results suggest that hand-crafting
many tagged examples might be an unnecessary effort. However, more work is
needed in order to address rarely used forms.
Related papers
- Explaining Datasets in Words: Statistical Models with Natural Language Parameters [66.69456696878842]
We introduce a family of statistical models -- including clustering, time series, and classification models -- parameterized by natural language predicates.
We apply our framework to a wide range of problems: taxonomizing user chat dialogues, characterizing how they evolve across time, finding categories where one language model is better than the other.
arXiv Detail & Related papers (2024-09-13T01:40:20Z) - Morphological Inflection with Phonological Features [7.245355976804435]
This work explores effects on performance obtained through various ways in which morphological models get access to subcharacter phonological features.
We elicit phonemic data from standard graphemic data using language-specific grammars for languages with shallow grapheme-to-phoneme mapping.
arXiv Detail & Related papers (2023-06-21T21:34:39Z) - Modeling Target-Side Morphology in Neural Machine Translation: A
Comparison of Strategies [72.56158036639707]
Morphologically rich languages pose difficulties to machine translation.
A large amount of differently inflected word surface forms entails a larger vocabulary.
Some inflected forms of infrequent terms typically do not appear in the training corpus.
Linguistic agreement requires the system to correctly match the grammatical categories between inflected word forms in the output sentence.
arXiv Detail & Related papers (2022-03-25T10:13:20Z) - Morphology Without Borders: Clause-Level Morphological Annotation [8.559428282730021]
We propose to view morphology as a clause-level phenomenon, rather than word-level.
We deliver a novel dataset for clause-level morphology covering 4 typologically-different languages: English, German, Turkish and Hebrew.
Our experiments show that the clause-level tasks are substantially harder than the respective word-level tasks, while having comparable complexity across languages.
arXiv Detail & Related papers (2022-02-25T17:20:28Z) - On The Ingredients of an Effective Zero-shot Semantic Parser [95.01623036661468]
We analyze zero-shot learning by paraphrasing training examples of canonical utterances and programs from a grammar.
We propose bridging these gaps using improved grammars, stronger paraphrasers, and efficient learning methods.
Our model achieves strong performance on two semantic parsing benchmarks (Scholar, Geo) with zero labeled data.
arXiv Detail & Related papers (2021-10-15T21:41:16Z) - Learning from Partially Overlapping Labels: Image Segmentation under
Annotation Shift [68.6874404805223]
We propose several strategies for learning from partially overlapping labels in the context of abdominal organ segmentation.
We find that combining a semi-supervised approach with an adaptive cross entropy loss can successfully exploit heterogeneously annotated data.
arXiv Detail & Related papers (2021-07-13T09:22:24Z) - Unnatural Language Inference [48.45003475966808]
We find that state-of-the-art NLI models, such as RoBERTa and BART, are invariant to, and sometimes even perform better on, examples with randomly reordered words.
Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.
arXiv Detail & Related papers (2020-12-30T20:40:48Z) - Modelling Verbal Morphology in Nen [4.6877729174041605]
We use state-of-the-art machine learning models for morphological reinflection to model Nen verbal morphology.
Our results show sensitivity to training data composition; different distributions of verb type yield different accuracies.
We also demonstrate the types of patterns that can be inferred from the training data through the case study of syncretism.
arXiv Detail & Related papers (2020-11-30T01:22:05Z) - A Simple Joint Model for Improved Contextual Neural Lemmatization [60.802451210656805]
We present a simple joint neural model for lemmatization and morphological tagging that achieves state-of-the-art results on 20 languages.
Our paper describes the model in addition to training and decoding procedures.
arXiv Detail & Related papers (2019-04-04T02:03:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.