Morphological Processing of Low-Resource Languages: Where We Are and
What's Next
- URL: http://arxiv.org/abs/2203.08909v1
- Date: Wed, 16 Mar 2022 19:47:04 GMT
- Title: Morphological Processing of Low-Resource Languages: Where We Are and
What's Next
- Authors: Adam Wiemerslage and Miikka Silfverberg and Changbing Yang and Arya D.
McCarthy and Garrett Nicolai and Eliana Colunga and Katharina Kann
- Abstract summary: We focus on approaches suitable for languages with minimal or no annotated resources.
We argue that the field is ready to tackle the logical next challenge: understanding a language's morphology from raw text alone.
- Score: 23.7371787793763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic morphological processing can aid downstream natural language
processing applications, especially for low-resource languages, and assist
language documentation efforts for endangered languages. Having long been
multilingual, the field of computational morphology is increasingly moving
towards approaches suitable for languages with minimal or no annotated
resources. First, we survey recent developments in computational morphology
with a focus on low-resource languages. Second, we argue that the field is
ready to tackle the logical next challenge: understanding a language's
morphology from raw text alone. We perform an empirical study on a truly
unsupervised version of the paradigm completion task and show that, while
existing state-of-the-art models bridged by two newly proposed models we devise
perform reasonably, there is still much room for improvement. The stakes are
high: solving this task will increase the language coverage of morphological
resources by a number of magnitudes.
Related papers
- Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers [81.47046536073682]
We present a review and provide a unified perspective to summarize the recent progress as well as emerging trends in multilingual large language models (MLLMs) literature.
We hope our work can provide the community with quick access and spur breakthrough research in MLLMs.
arXiv Detail & Related papers (2024-04-07T11:52:44Z) - Extending Multilingual Machine Translation through Imitation Learning [60.15671816513614]
Imit-MNMT treats the task as an imitation learning process, which mimicks the behavior of an expert.
We show that our approach significantly improves the translation performance between the new and the original languages.
We also demonstrate that our approach is capable of solving copy and off-target problems.
arXiv Detail & Related papers (2023-11-14T21:04:03Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Neural Machine Translation For Low Resource Languages [0.0]
This paper investigates the realm of low resource languages and builds a Neural Machine Translation model to achieve state-of-the-art results.
The paper looks to build upon the mBART language model and explore strategies to augment it with various NLP and Deep Learning techniques.
arXiv Detail & Related papers (2023-04-16T19:27:48Z) - Overcoming Language Disparity in Online Content Classification with
Multimodal Learning [22.73281502531998]
Large language models are now the standard to develop state-of-the-art solutions for text detection and classification tasks.
The development of advanced computational techniques and resources is disproportionately focused on the English language.
We explore the promise of incorporating the information contained in images via multimodal machine learning.
arXiv Detail & Related papers (2022-05-19T17:56:02Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - How Low is Too Low? A Computational Perspective on Extremely
Low-Resource Languages [1.7625363344837164]
We introduce the first cross-lingual information extraction pipeline for Sumerian.
We also curate InterpretLR, an interpretability toolkit for low-resource NLP.
Most components of our pipeline can be generalised to any other language to obtain an interpretable execution.
arXiv Detail & Related papers (2021-05-30T12:09:59Z) - Combining Pretrained High-Resource Embeddings and Subword
Representations for Low-Resource Languages [24.775371434410328]
We explore techniques exploiting the qualities of morphologically rich languages (MRLs)
We show that a meta-embedding approach combining both pretrained and morphologically-informed word embeddings performs best in the downstream task of Xhosa-English translation.
arXiv Detail & Related papers (2020-03-09T21:30:55Z) - A Simple Joint Model for Improved Contextual Neural Lemmatization [60.802451210656805]
We present a simple joint neural model for lemmatization and morphological tagging that achieves state-of-the-art results on 20 languages.
Our paper describes the model in addition to training and decoding procedures.
arXiv Detail & Related papers (2019-04-04T02:03:19Z) - Cross-lingual, Character-Level Neural Morphological Tagging [57.0020906265213]
We train character-level recurrent neural taggers to predict morphological taggings for high-resource languages and low-resource languages together.
Learning joint character representations among multiple related languages successfully enables knowledge transfer from the high-resource languages to the low-resource ones, improving accuracy by up to 30% over a monolingual model.
arXiv Detail & Related papers (2017-08-30T08:14:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.