Towards Finite-State Morphology of Kurdish
- URL: http://arxiv.org/abs/2005.10652v1
- Date: Thu, 21 May 2020 13:55:07 GMT
- Title: Towards Finite-State Morphology of Kurdish
- Authors: Sina Ahmadi, Hossein Hassani
- Abstract summary: The morphology of the Kurdish language (Sorani dialect) is described from a computational point of view.
We extract morphological rules which are transformed into finite-state transducers for generating and analyzing words.
- Score: 0.76146285961466
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Morphological analysis is the study of the formation and structure of words.
It plays a crucial role in various tasks in Natural Language Processing (NLP)
and Computational Linguistics (CL) such as machine translation and text and
speech generation. Kurdish is a less-resourced multi-dialect Indo-European
language with highly inflectional morphology. In this paper, as the first
attempt of its kind, the morphology of the Kurdish language (Sorani dialect) is
described from a computational point of view. We extract morphological rules
which are transformed into finite-state transducers for generating and
analyzing words. The result of this research assists in conducting studies on
language generation for Kurdish and enhances the Information Retrieval (IR)
capacity for the language while leveraging the Kurdish NLP and CL into a more
advanced computational level.
Related papers
- Recent advancements in computational morphology : A comprehensive survey [0.11606731918609076]
Computational morphology handles the language processing at the word level.
Morpheme boundary detection, lemmatization, morphological feature tagging, morphological reinflection etc.
arXiv Detail & Related papers (2024-06-08T10:07:33Z) - Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - Language Evolution with Deep Learning [49.879239655532324]
Computational modeling plays an essential role in the study of language emergence.
It aims to simulate the conditions and learning processes that could trigger the emergence of a structured language.
This chapter explores another class of computational models that have recently revolutionized the field of machine learning: deep learning models.
arXiv Detail & Related papers (2024-03-18T16:52:54Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Hunspell for Sorani Kurdish Spell Checking and Morphological Analysis [0.0]
We present our efforts in annotating a lexicon with morphosyntactic tags and also, extracting morphological rules of Sorani Kurdish to build a morphological analyzer, a stemmer and a spell-checking system using Hunspell.
This implementation can be used for further developments in the field by researchers and also, be integrated into text editors under a publicly available license.
arXiv Detail & Related papers (2021-09-14T00:24:20Z) - A Formal Description of Sorani Kurdish Morphology [0.0]
Sorani Kurdish, also known as Central Kurdish, has a complex morphology.
We provide a detailed description of Sorani Kurdish morphological and morphophonological constructions in a formal way.
arXiv Detail & Related papers (2021-09-08T21:34:26Z) - Central Kurdish machine translation: First large scale parallel corpus
and experiments [2.099922236065961]
We present the first large scale parallel corpus of Central Kurdish-English, Awta, containing 229,222 pairs of manually aligned translations.
Our best performing systems achieve 22.72 and 16.81 in BLEU score for Ku$rightarrow$EN and En$rightarrow$Ku, respectively.
arXiv Detail & Related papers (2021-06-17T08:41:53Z) - Towards Machine Translation for the Kurdish Language [0.0]
Machine translation is the task of translating texts from one language to another using computers.
Kurdish, an Indo-European language, has received little attention in this realm due to the language being less-resourced.
We describe the available scarce parallel data suitable for training a neural machine translation model for Sorani Kurdish-English translation.
arXiv Detail & Related papers (2020-10-12T21:28:57Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - Evaluating Transformer-Based Multilingual Text Classification [55.53547556060537]
We argue that NLP tools perform unequally across languages with different syntactic and morphological structures.
We calculate word order and morphological similarity indices to aid our empirical study.
arXiv Detail & Related papers (2020-04-29T03:34:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.