Related papers: A Finite State Transducer Based Morphological Analyzer of Maithili Language

A Finite State Transducer Based Morphological Analyzer of Maithili Language

URL: http://arxiv.org/abs/2003.00234v1
Date: Sat, 29 Feb 2020 11:00:15 GMT
Title: A Finite State Transducer Based Morphological Analyzer of Maithili Language
Authors: Raza Rahi, Sumant Pushp, Arif Khan, Smriti Kumar Sinha
Abstract summary: We present a finite state transducer based inflectional morphological analyzer for a resource poor language of India, known as Maithili. Maithili is an eastern Indo-Aryan language spoken in the eastern and northern regions of Bihar in India and the southeastern plains, known as tarai of Nepal.
Score: 2.752817022620644
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Morphological analyzers are the essential milestones for many linguistic applications like; machine translation, word sense disambiguation, spells checkers, and search engines etc. Therefore, development of an effective morphological analyzer has a greater impact on the computational recognition of a language. In this paper, we present a finite state transducer based inflectional morphological analyzer for a resource poor language of India, known as Maithili. Maithili is an eastern Indo-Aryan language spoken in the eastern and northern regions of Bihar in India and the southeastern plains, known as tarai of Nepal. This work can be recognized as the first work towards the computational development of Maithili which may attract researchers around the country to up-rise the language to establish in computational world.

Related papers

RegSpeech12: A Regional Corpus of Bengali Spontaneous Speech Across Dialects [5.805745873296805]
The Bengali language is spoken extensively across South Asia and among diasporic communities.<n>Five principal dialect groups are identified: Eastern Bengali, Manbhumi, Rangpuri, Varendri, and Rarhi.<n>Research on the computational processing of Bengali dialects remains limited.
arXiv Detail & Related papers (2025-10-28T06:08:42Z)
SentiMaithili: A Benchmark Dataset for Sentiment and Reason Generation for the Low-Resource Maithili Language [0.9743193980153243]
Maithili is an Indo-Aryan language spoken by more than 13 million people in the Purvanchal region of India.<n>This work establishes the first benchmark for explainable affective computing in Maithili.
arXiv Detail & Related papers (2025-10-25T04:58:18Z)
Survey of Pseudonymization, Abstractive Summarization & Spell Checker for Hindi and Marathi [0.0]
The paper aims to build a platform which enables the user to use various features like text anonymization, abstractive text summarization and spell checking in English, Hindi and Marathi language. The aim of these tools is to serve enterprise and consumer clients who predominantly use Indian Regional languages.
arXiv Detail & Related papers (2024-12-24T04:51:32Z)
Morphology and Syntax of the Tamil Language [0.0]
The paper highlights the complexity and richness of Tamil in terms of its morphological and syntactic features. It is proven as a rule-based morphological analyser cum generator and a computational grammar for Tamil have already been developed based on this paper.
arXiv Detail & Related papers (2024-01-16T13:52:25Z)
Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets. This survey delves into an important attribute of these datasets: the dialect of a language. Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z)
Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z)
Linguistic Analysis using Paninian System of Sounds and Finite State Machines [0.0]
The study of spoken languages comprises phonology, morphology, and grammar. The languages can be classified as root languages, inflectional languages, and stem languages. All these factors lead to the formation of vocabulary which has commonality/similarity as well as distinct and subtle differences across languages.
arXiv Detail & Related papers (2023-01-29T15:22:10Z)
Urdu Morphology, Orthography and Lexicon Extraction [0.0]
This paper describes an implementation of the Urdu language as a software API. We deal with orthography, morphology and the extraction of the lexicon.
arXiv Detail & Related papers (2022-04-06T20:14:01Z)
Utilizing Wordnets for Cognate Detection among Indian Languages [50.83320088758705]
We detect cognate word pairs among ten Indian languages with Hindi. We use deep learning methodologies to predict whether a word pair is cognate or not. We report improved performance of up to 26%.
arXiv Detail & Related papers (2021-12-30T16:46:28Z)
Harnessing Cross-lingual Features to Improve Cognate Detection for Low-resource Languages [50.82410844837726]
We demonstrate the use of cross-lingual word embeddings for detecting cognates among fourteen Indian languages. We evaluate our methods to detect cognates on a challenging dataset of twelve Indian languages. We observe an improvement of up to 18% points, in terms of F-score, for cognate detection.
arXiv Detail & Related papers (2021-12-16T11:17:58Z)
Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages. We infer this distribution from a sample of typologically diverse training languages. We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z)
Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties [77.2347265289855]
We focus on phoneme recognition using Allosaurus, a method for multilingual recognition based on phonetic annotation. To evaluate in a challenging real-world scenario, we curate phone recognition datasets for Bukusu and Saamia, two varieties of the Luhya language cluster of western Kenya and eastern Uganda. We find that fine-tuning of Allosaurus, even with just 100 utterances, leads to significant improvements in phone error rates.
arXiv Detail & Related papers (2021-04-04T15:07:55Z)
Towards Finite-State Morphology of Kurdish [0.76146285961466]
The morphology of the Kurdish language (Sorani dialect) is described from a computational point of view. We extract morphological rules which are transformed into finite-state transducers for generating and analyzing words.
arXiv Detail & Related papers (2020-05-21T13:55:07Z)
Bridging Linguistic Typology and Multilingual Machine Translation with Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source. We observe that our representations embed typology and strengthen correlations with language relationships. We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
Unsupervised Separation of Native and Loanwords for Malayalam and Telugu [3.4925763160992402]
Words from one language are adopted within a different language without translation; these words appear in transliterated form in text written in the latter language. This phenomenon is particularly widespread within Indian languages where many words are loaned from English. We address the task of identifying loanwords automatically and in an unsupervised manner, from large datasets of words from agglutinative Dravidian languages.
arXiv Detail & Related papers (2020-02-12T04:01:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.