Related papers: Sentiment Analysis and Emotion Classification using Machine Learning Techniques for Nagamese Language

Sentiment Analysis and Emotion Classification using Machine Learning Techniques for Nagamese Language - A Low-resource Language

URL: http://arxiv.org/abs/2512.01256v1
Date: Mon, 01 Dec 2025 04:01:29 GMT
Title: Sentiment Analysis and Emotion Classification using Machine Learning Techniques for Nagamese Language - A Low-resource Language
Authors: Ekha Morang, Surhoni A. Ngullie, Sashienla Longkumer, Teisovi Angami,
Abstract summary: The aim of this work is to detect sentiments in terms of polarity (positive, negative and neutral) and basic emotions contained in Nagamese language.<n>We build sentiment polarity lexicon of 1,195 nagamese words and use these to build features for supervised machine learning techniques.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The Nagamese language, a.k.a Naga Pidgin, is an Assamese-lexified creole language developed primarily as a means of communication in trade between the people from Nagaland and people from Assam in the north-east India. Substantial amount of work in sentiment analysis has been done for resource-rich languages like English, Hindi, etc. However, no work has been done in Nagamese language. To the best of our knowledge, this is the first attempt on sentiment analysis and emotion classification for the Nagamese Language. The aim of this work is to detect sentiments in terms of polarity (positive, negative and neutral) and basic emotions contained in textual content of Nagamese language. We build sentiment polarity lexicon of 1,195 nagamese words and use these to build features along with additional features for supervised machine learning techniques using Na"ive Bayes and Support Vector Machines. Keywords: Nagamese, NLP, sentiment analysis, machine learning

Related papers

Part-of-speech tagging for Nagamese Language using CRF [0.0]
This paper investigates part-of-speech tagging, an important task in Natural Language Processing (NLP) for the Nagamese language.<n>An annotated corpus of 16,112 tokens is created and applied machine learning technique known as Conditional Random Fields (CRF)<n>Using CRF, an overall tagging accuracy of 85.70%; precision, recall of 86%, and f1-score of 85% is achieved.
arXiv Detail & Related papers (2025-09-16T12:59:55Z)
Sentiment Analysis Across Languages: Evaluation Before and After Machine Translation to English [0.0]
This paper examines the performance of transformer models in Sentiment Analysis tasks across multilingual datasets and text that has undergone machine translation. By comparing the effectiveness of these models in different linguistic contexts, we gain insights into their performance variations and potential implications for sentiment analysis across diverse languages.
arXiv Detail & Related papers (2024-05-05T10:52:09Z)
Teacher Perception of Automatically Extracted Grammar Concepts for L2 Language Learning [66.79173000135717]
We apply this work to teaching two Indian languages, Kannada and Marathi, which do not have well-developed resources for second language learning. We extract descriptions from a natural text corpus that answer questions about morphosyntax (learning of word order, agreement, case marking, or word formation) and semantics (learning of vocabulary). We enlist the help of language educators from schools in North America to perform a manual evaluation, who find the materials have potential to be used for their lesson preparation and learner evaluation.
arXiv Detail & Related papers (2023-10-27T18:17:29Z)
Multi-lingual and Multi-cultural Figurative Language Understanding [69.47641938200817]
Figurative language permeates human communication, but is relatively understudied in NLP. We create a dataset for seven diverse languages associated with a variety of cultures: Hindi, Indonesian, Javanese, Kannada, Sundanese, Swahili and Yoruba. Our dataset reveals that each language relies on cultural and regional concepts for figurative expressions, with the highest overlap between languages originating from the same region. All languages exhibit a significant deficiency compared to English, with variations in performance reflecting the availability of pre-training and fine-tuning data.
arXiv Detail & Related papers (2023-05-25T15:30:31Z)
Informative Language Representation Learning for Massively Multilingual Neural Machine Translation [47.19129812325682]
In a multilingual neural machine translation model, an artificial language token is usually used to guide translation into the desired target language. Recent studies show that prepending language tokens sometimes fails to navigate the multilingual neural machine translation models into right translation directions. We propose two methods, language embedding embodiment and language-aware multi-head attention, to learn informative language representations to channel translation into right directions.
arXiv Detail & Related papers (2022-09-04T04:27:17Z)
Utilizing Wordnets for Cognate Detection among Indian Languages [50.83320088758705]
We detect cognate word pairs among ten Indian languages with Hindi. We use deep learning methodologies to predict whether a word pair is cognate or not. We report improved performance of up to 26%.
arXiv Detail & Related papers (2021-12-30T16:46:28Z)
Harnessing Cross-lingual Features to Improve Cognate Detection for Low-resource Languages [50.82410844837726]
We demonstrate the use of cross-lingual word embeddings for detecting cognates among fourteen Indian languages. We evaluate our methods to detect cognates on a challenging dataset of twelve Indian languages. We observe an improvement of up to 18% points, in terms of F-score, for cognate detection.
arXiv Detail & Related papers (2021-12-16T11:17:58Z)
Attention based Sequence to Sequence Learning for Machine Translation of Low Resourced Indic Languages -- A case of Sanskrit to Hindi [0.0]
The paper shows the construction of Sanskrit to Hindi bilingual parallel corpus with nearly 10K samples and having 178,000 tokens. The attention mechanism based neural translation has achieved 88% accuracy in human evaluation and a BLEU score of 0.92 on Sanskrit to Hindi translation.
arXiv Detail & Related papers (2021-09-07T04:55:48Z)
Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages. We infer this distribution from a sample of typologically diverse training languages. We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z)
Development of a General Purpose Sentiment Lexicon for Igbo Language [0.0]
This work creates a general purpose sentiment lexicon for the Igbo language. It can determine the sentiment of documents written in the Igbo language without having to translate it to the English language.
arXiv Detail & Related papers (2020-04-24T22:10:34Z)
A Finite State Transducer Based Morphological Analyzer of Maithili Language [2.752817022620644]
We present a finite state transducer based inflectional morphological analyzer for a resource poor language of India, known as Maithili. Maithili is an eastern Indo-Aryan language spoken in the eastern and northern regions of Bihar in India and the southeastern plains, known as tarai of Nepal.
arXiv Detail & Related papers (2020-02-29T11:00:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.