A Finite State Transducer Based Morphological Analyzer of Maithili
Language
- URL: http://arxiv.org/abs/2003.00234v1
- Date: Sat, 29 Feb 2020 11:00:15 GMT
- Title: A Finite State Transducer Based Morphological Analyzer of Maithili
Language
- Authors: Raza Rahi, Sumant Pushp, Arif Khan, Smriti Kumar Sinha
- Abstract summary: We present a finite state transducer based inflectional morphological analyzer for a resource poor language of India, known as Maithili.
Maithili is an eastern Indo-Aryan language spoken in the eastern and northern regions of Bihar in India and the southeastern plains, known as tarai of Nepal.
- Score: 2.752817022620644
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Morphological analyzers are the essential milestones for many linguistic
applications like; machine translation, word sense disambiguation, spells
checkers, and search engines etc. Therefore, development of an effective
morphological analyzer has a greater impact on the computational recognition of
a language. In this paper, we present a finite state transducer based
inflectional morphological analyzer for a resource poor language of India,
known as Maithili. Maithili is an eastern Indo-Aryan language spoken in the
eastern and northern regions of Bihar in India and the southeastern plains,
known as tarai of Nepal. This work can be recognized as the first work towards
the computational development of Maithili which may attract researchers around
the country to up-rise the language to establish in computational world.
Related papers
- Survey of Pseudonymization, Abstractive Summarization & Spell Checker for Hindi and Marathi [0.0]
The paper aims to build a platform which enables the user to use various features like text anonymization, abstractive text summarization and spell checking in English, Hindi and Marathi language.
The aim of these tools is to serve enterprise and consumer clients who predominantly use Indian Regional languages.
arXiv Detail & Related papers (2024-12-24T04:51:32Z) - Morphology and Syntax of the Tamil Language [0.0]
The paper highlights the complexity and richness of Tamil in terms of its morphological and syntactic features.
It is proven as a rule-based morphological analyser cum generator and a computational grammar for Tamil have already been developed based on this paper.
arXiv Detail & Related papers (2024-01-16T13:52:25Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Linguistic Analysis using Paninian System of Sounds and Finite State Machines [0.0]
The study of spoken languages comprises phonology, morphology, and grammar.
The languages can be classified as root languages, inflectional languages, and stem languages.
All these factors lead to the formation of vocabulary which has commonality/similarity as well as distinct and subtle differences across languages.
arXiv Detail & Related papers (2023-01-29T15:22:10Z) - Utilizing Wordnets for Cognate Detection among Indian Languages [50.83320088758705]
We detect cognate word pairs among ten Indian languages with Hindi.
We use deep learning methodologies to predict whether a word pair is cognate or not.
We report improved performance of up to 26%.
arXiv Detail & Related papers (2021-12-30T16:46:28Z) - Harnessing Cross-lingual Features to Improve Cognate Detection for
Low-resource Languages [50.82410844837726]
We demonstrate the use of cross-lingual word embeddings for detecting cognates among fourteen Indian languages.
We evaluate our methods to detect cognates on a challenging dataset of twelve Indian languages.
We observe an improvement of up to 18% points, in terms of F-score, for cognate detection.
arXiv Detail & Related papers (2021-12-16T11:17:58Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Phoneme Recognition through Fine Tuning of Phonetic Representations: a
Case Study on Luhya Language Varieties [77.2347265289855]
We focus on phoneme recognition using Allosaurus, a method for multilingual recognition based on phonetic annotation.
To evaluate in a challenging real-world scenario, we curate phone recognition datasets for Bukusu and Saamia, two varieties of the Luhya language cluster of western Kenya and eastern Uganda.
We find that fine-tuning of Allosaurus, even with just 100 utterances, leads to significant improvements in phone error rates.
arXiv Detail & Related papers (2021-04-04T15:07:55Z) - Towards Finite-State Morphology of Kurdish [0.76146285961466]
The morphology of the Kurdish language (Sorani dialect) is described from a computational point of view.
We extract morphological rules which are transformed into finite-state transducers for generating and analyzing words.
arXiv Detail & Related papers (2020-05-21T13:55:07Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - Unsupervised Separation of Native and Loanwords for Malayalam and Telugu [3.4925763160992402]
Words from one language are adopted within a different language without translation; these words appear in transliterated form in text written in the latter language.
This phenomenon is particularly widespread within Indian languages where many words are loaned from English.
We address the task of identifying loanwords automatically and in an unsupervised manner, from large datasets of words from agglutinative Dravidian languages.
arXiv Detail & Related papers (2020-02-12T04:01:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.