A Finite State Transducer Based Morphological Analyzer of Maithili
Language
- URL: http://arxiv.org/abs/2003.00234v1
- Date: Sat, 29 Feb 2020 11:00:15 GMT
- Title: A Finite State Transducer Based Morphological Analyzer of Maithili
Language
- Authors: Raza Rahi, Sumant Pushp, Arif Khan, Smriti Kumar Sinha
- Abstract summary: We present a finite state transducer based inflectional morphological analyzer for a resource poor language of India, known as Maithili.
Maithili is an eastern Indo-Aryan language spoken in the eastern and northern regions of Bihar in India and the southeastern plains, known as tarai of Nepal.
- Score: 2.752817022620644
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Morphological analyzers are the essential milestones for many linguistic
applications like; machine translation, word sense disambiguation, spells
checkers, and search engines etc. Therefore, development of an effective
morphological analyzer has a greater impact on the computational recognition of
a language. In this paper, we present a finite state transducer based
inflectional morphological analyzer for a resource poor language of India,
known as Maithili. Maithili is an eastern Indo-Aryan language spoken in the
eastern and northern regions of Bihar in India and the southeastern plains,
known as tarai of Nepal. This work can be recognized as the first work towards
the computational development of Maithili which may attract researchers around
the country to up-rise the language to establish in computational world.
Related papers
- Morphology and Syntax of the Tamil Language [0.0]
The paper highlights the complexity and richness of Tamil in terms of its morphological and syntactic features.
It is proven as a rule-based morphological analyser cum generator and a computational grammar for Tamil have already been developed based on this paper.
arXiv Detail & Related papers (2024-01-16T13:52:25Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Linguistic Analysis using Paninian System of Sounds and Finite State Machines [0.0]
The study of spoken languages comprises phonology, morphology, and grammar.
The languages can be classified as root languages, inflectional languages, and stem languages.
All these factors lead to the formation of vocabulary which has commonality/similarity as well as distinct and subtle differences across languages.
arXiv Detail & Related papers (2023-01-29T15:22:10Z) - Urdu Morphology, Orthography and Lexicon Extraction [0.0]
This paper describes an implementation of the Urdu language as a software API.
We deal with orthography, morphology and the extraction of the lexicon.
arXiv Detail & Related papers (2022-04-06T20:14:01Z) - Utilizing Wordnets for Cognate Detection among Indian Languages [50.83320088758705]
We detect cognate word pairs among ten Indian languages with Hindi.
We use deep learning methodologies to predict whether a word pair is cognate or not.
We report improved performance of up to 26%.
arXiv Detail & Related papers (2021-12-30T16:46:28Z) - Harnessing Cross-lingual Features to Improve Cognate Detection for
Low-resource Languages [50.82410844837726]
We demonstrate the use of cross-lingual word embeddings for detecting cognates among fourteen Indian languages.
We evaluate our methods to detect cognates on a challenging dataset of twelve Indian languages.
We observe an improvement of up to 18% points, in terms of F-score, for cognate detection.
arXiv Detail & Related papers (2021-12-16T11:17:58Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Towards Finite-State Morphology of Kurdish [0.76146285961466]
The morphology of the Kurdish language (Sorani dialect) is described from a computational point of view.
We extract morphological rules which are transformed into finite-state transducers for generating and analyzing words.
arXiv Detail & Related papers (2020-05-21T13:55:07Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - Unsupervised Separation of Native and Loanwords for Malayalam and Telugu [3.4925763160992402]
Words from one language are adopted within a different language without translation; these words appear in transliterated form in text written in the latter language.
This phenomenon is particularly widespread within Indian languages where many words are loaned from English.
We address the task of identifying loanwords automatically and in an unsupervised manner, from large datasets of words from agglutinative Dravidian languages.
arXiv Detail & Related papers (2020-02-12T04:01:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.