Development of a General Purpose Sentiment Lexicon for Igbo Language
- URL: http://arxiv.org/abs/2004.14176v1
- Date: Fri, 24 Apr 2020 22:10:34 GMT
- Title: Development of a General Purpose Sentiment Lexicon for Igbo Language
- Authors: Emeka Ogbuju and Moses Onyesolu
- Abstract summary: This work creates a general purpose sentiment lexicon for the Igbo language.
It can determine the sentiment of documents written in the Igbo language without having to translate it to the English language.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There are publicly available general purpose sentiment lexicons in some high
resource languages but very few exist in the low resource languages. This makes
it difficult to directly perform sentiment analysis tasks in such languages.
The objective of this work is to create a general purpose sentiment lexicon for
the Igbo language that can determine the sentiment of documents written in the
Igbo language without having to translate it to the English language. The
material used was an automatically translated lexicon by Liu and the manual
addition of Igbo native words. The result of this work is a general purpose
lexicon called IgboSentilex. The performance was tested on the BBC Igbo news
channel. It returned an average polarity agreement of 95.75 percent with other
general purpose sentiment lexicons.
Related papers
- CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data [56.043078390377076]
We introduce CommonLID, a community-driven, human-annotated LID benchmark for the web domain.<n>We show CommonLID's value by using it, alongside five other common evaluation sets, to test eight popular LID models.<n>We highlight that existing evaluations overestimate LID accuracy for many languages in the web domain.
arXiv Detail & Related papers (2026-01-25T22:49:30Z) - Sentiment Analysis and Emotion Classification using Machine Learning Techniques for Nagamese Language - A Low-resource Language [0.0]
The aim of this work is to detect sentiments in terms of polarity (positive, negative and neutral) and basic emotions contained in Nagamese language.<n>We build sentiment polarity lexicon of 1,195 nagamese words and use these to build features for supervised machine learning techniques.
arXiv Detail & Related papers (2025-12-01T04:01:29Z) - Cross-lingual Opinions and Emotions Mining in Comparable Documents [0.0]
This research studies differences in sentiments and emotions across English-Arabic comparable documents.<n>We manually translate the English WordNet-Affect (WNA) lexicon into Arabic, creating bilingual emotion lexicons used to label the comparable corpora.<n>Results show that sentiment and emotion annotations align when articles come from the same news agency and diverge when they come from different ones.
arXiv Detail & Related papers (2025-08-05T05:44:28Z) - BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages [93.92804151830744]
We present BRIGHTER -- a collection of multi-labeled datasets in 28 different languages.
We describe the data collection and annotation processes and the challenges of building these datasets.
We show that BRIGHTER datasets are a step towards bridging the gap in text-based emotion recognition.
arXiv Detail & Related papers (2025-02-17T15:39:50Z) - Human-LLM Collaborative Construction of a Cantonese Emotion Lexicon [1.3074442742310615]
This study proposes to develop an emotion lexicon for Cantonese, a low-resource language.
By integrating emotion labels provided by Large Language Models (LLMs) and human annotators, the study leveraged existing linguistic resources.
The consistency of the proposed emotion lexicon in emotion extraction was assessed through modification and utilization of three distinct emotion text datasets.
arXiv Detail & Related papers (2024-10-15T11:57:34Z) - Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages [55.157295899188476]
neural machine translation systems learn to map sentences of different languages into a common representation space.
In this work, we test this hypothesis by zero-shot translating from unseen languages.
We demonstrate that this setup enables zero-shot translation from entirely unseen languages.
arXiv Detail & Related papers (2024-08-05T07:58:58Z) - Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment [50.27950279695363]
The transfer performance is often hindered when a low-resource target language is written in a different script than the high-resource source language.
Inspired by recent work that uses transliteration to address this problem, our paper proposes a transliteration-based post-pretraining alignment (PPA) method.
arXiv Detail & Related papers (2024-06-28T08:59:24Z) - The IgboAPI Dataset: Empowering Igbo Language Technologies through Multi-dialectal Enrichment [3.087699704782493]
The Igbo language is facing a risk of becoming endangered, as indicated by a 2025 UNESCO study.
To create robust, impactful, and widely adopted language technologies for Igbo, it is essential to incorporate the multi-dialectal nature of the language.
We present the IgboAPI dataset, a multi-dialectal Igbo-English dictionary dataset, developed with the aim of enhancing the representation of Igbo dialects.
arXiv Detail & Related papers (2024-05-02T04:27:35Z) - Zero-shot Sentiment Analysis in Low-Resource Languages Using a
Multilingual Sentiment Lexicon [78.12363425794214]
We focus on zero-shot sentiment analysis tasks across 34 languages, including 6 high/medium-resource languages, 25 low-resource languages, and 3 code-switching datasets.
We demonstrate that pretraining using multilingual lexicons, without using any sentence-level sentiment data, achieves superior zero-shot performance compared to models fine-tuned on English sentiment datasets.
arXiv Detail & Related papers (2024-02-03T10:41:05Z) - NusaWrites: Constructing High-Quality Corpora for Underrepresented and
Extremely Low-Resource Languages [54.808217147579036]
We conduct a case study on Indonesian local languages.
We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets.
Our findings demonstrate that datasets generated through paragraph writing by native speakers exhibit superior quality in terms of lexical diversity and cultural content.
arXiv Detail & Related papers (2023-09-19T14:42:33Z) - No Language Left Behind: Scaling Human-Centered Machine Translation [69.28110770760506]
We create datasets and models aimed at narrowing the performance gap between low and high-resource languages.
We propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks.
Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art.
arXiv Detail & Related papers (2022-07-11T07:33:36Z) - NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual
Sentiment Analysis [5.048355865260207]
We introduce the first large-scale human-annotated Twitter sentiment dataset for the four most widely spoken languages in Nigeria.
The dataset consists of around 30,000 annotated tweets per language.
We release the datasets, trained models, sentiment lexicons, and code to incentivize research on sentiment analysis in under-represented languages.
arXiv Detail & Related papers (2022-01-20T16:28:06Z) - When Word Embeddings Become Endangered [0.685316573653194]
We present a method for constructing word embeddings for endangered languages using existing word embeddings of different resource-rich languages and translation dictionaries of resource-poor languages.
All our cross-lingual word embeddings and the sentiment analysis model have been released openly via an easy-to-use Python library.
arXiv Detail & Related papers (2021-03-24T15:42:53Z) - Learning and Evaluating Emotion Lexicons for 91 Languages [10.06987680744477]
We introduce a methodology for creating almost arbitrarily large emotion lexicons for any target language.
We generate representationally rich high-coverage lexicons comprising eight emotional variables with more than 100k lexical entries each.
Our approach produces results in line with state-of-the-art monolingual approaches to lexicon creation and even surpasses human reliability for some languages and variables.
arXiv Detail & Related papers (2020-05-12T10:32:03Z) - Design Challenges in Low-resource Cross-lingual Entity Linking [56.18957576362098]
Cross-lingual Entity Linking (XEL) is the problem of grounding mentions of entities in a foreign language text into an English knowledge base such as Wikipedia.
This paper focuses on the key step of identifying candidate English Wikipedia titles that correspond to a given foreign language mention.
We present a simple yet effective zero-shot XEL system, QuEL, that utilizes search engines query logs.
arXiv Detail & Related papers (2020-05-02T04:00:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.