yosm: A new yoruba sentiment corpus for movie reviews
- URL: http://arxiv.org/abs/2204.09711v1
- Date: Wed, 20 Apr 2022 18:00:37 GMT
- Title: yosm: A new yoruba sentiment corpus for movie reviews
- Authors: Iyanuoluwa Shode, David Ifeoluwa Adelani, and Anna Feldman
- Abstract summary: We explore sentiment analysis on reviews of Nigerian movies.
The data comprised 1500 movie reviews that were sourced from IMDB, Rotten Tomatoes, Letterboxd, Cinemapointer and Nollyrated.
We develop sentiment classification models using the state-of-the-art pre-trained language models like mBERT and AfriBERTa.
- Score: 2.3513645401551337
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A movie that is thoroughly enjoyed and recommended by an individual might be
hated by another. One characteristic of humans is the ability to have feelings
which could be positive or negative. To automatically classify and study human
feelings, an aspect of natural language processing, sentiment analysis and
opinion mining were designed to understand human feelings regarding several
issues which could affect a product, a social media platforms, government, or
societal discussions or even movies. Several works on sentiment analysis have
been done on high resource languages while low resources languages like Yoruba
have been sidelined. Due to the scarcity of datasets and linguistic
architectures that will suit low resource languages, African languages "low
resource languages" have been ignored and not fully explored. For this reason,
our attention is placed on Yoruba to explore sentiment analysis on reviews of
Nigerian movies. The data comprised 1500 movie reviews that were sourced from
IMDB, Rotten Tomatoes, Letterboxd, Cinemapointer and Nollyrated. We develop
sentiment classification models using the state-of-the-art pre-trained language
models like mBERT and AfriBERTa to classify the movie reviews.
Related papers
- You Shall Know a Tool by the Traces it Leaves: The Predictability of Sentiment Analysis Tools [74.98850427240464]
We show that sentiment analysis tools disagree on the same dataset.
We show that the sentiment tool used for sentiment annotation can even be predicted from its outcome.
arXiv Detail & Related papers (2024-10-18T17:27:38Z) - See It from My Perspective: Diagnosing the Western Cultural Bias of Large Vision-Language Models in Image Understanding [78.88461026069862]
Vision-language models (VLMs) can respond to queries about images in many languages.
We present a novel investigation that demonstrates and localizes Western bias in image understanding.
arXiv Detail & Related papers (2024-06-17T15:49:51Z) - LFED: A Literary Fiction Evaluation Dataset for Large Language Models [58.85989777743013]
We collect 95 literary fictions that are either originally written in Chinese or translated into Chinese, covering a wide range of topics across several centuries.
We define a question taxonomy with 8 question categories to guide the creation of 1,304 questions.
We conduct an in-depth analysis to ascertain how specific attributes of literary fictions (e.g., novel types, character numbers, the year of publication) impact LLM performance in evaluations.
arXiv Detail & Related papers (2024-05-16T15:02:24Z) - SOUL: Towards Sentiment and Opinion Understanding of Language [96.74878032417054]
We propose a new task called Sentiment and Opinion Understanding of Language (SOUL)
SOUL aims to evaluate sentiment understanding through two subtasks: Review (RC) and Justification Generation (JG)
arXiv Detail & Related papers (2023-10-27T06:48:48Z) - NusaWrites: Constructing High-Quality Corpora for Underrepresented and
Extremely Low-Resource Languages [54.808217147579036]
We conduct a case study on Indonesian local languages.
We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets.
Our findings demonstrate that datasets generated through paragraph writing by native speakers exhibit superior quality in terms of lexical diversity and cultural content.
arXiv Detail & Related papers (2023-09-19T14:42:33Z) - AlbMoRe: A Corpus of Movie Reviews for Sentiment Analysis in Albanian [0.0]
AlbMoRe is a corpus of 800 movie reviews in Albanian.
Each text is labeled as positive or negative and can be used for sentiment analysis research.
arXiv Detail & Related papers (2023-06-14T14:21:55Z) - NollySenti: Leveraging Transfer Learning and Machine Translation for
Nigerian Movie Sentiment Classification [10.18858070640917]
Africa has over 2000 indigenous languages but they are under-represented in NLP research due to lack of datasets.
We create a new dataset, NollySenti, based on the Nollywood movie reviews for five languages widely spoken in Nigeria (English, Hausa, Igbo, Nigerian-Pidgin, and Yoruba)
arXiv Detail & Related papers (2023-05-18T13:38:36Z) - No Language Left Behind: Scaling Human-Centered Machine Translation [69.28110770760506]
We create datasets and models aimed at narrowing the performance gap between low and high-resource languages.
We propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks.
Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art.
arXiv Detail & Related papers (2022-07-11T07:33:36Z) - Sentiment Classification in Swahili Language Using Multilingual BERT [0.04297070083645048]
This study uses the current state-of-the-art model, multilingual BERT, to perform sentiment classification on Swahili datasets.
The data was created by extracting and annotating 8.2k reviews and comments on different social media platforms and the ISEAR emotion dataset.
The model was fine-tuned and achieve the best accuracy of 87.59%.
arXiv Detail & Related papers (2021-04-19T01:47:00Z) - Multilingual, Temporal and Sentimental Distant-Reading of City Events [0.0]
This analysis aims to apply distant reading on Berlinale tweets collected during the festival.
We trained a deep sentiment network with multilingual embeddings.
The trained algorithm has a 0.78 test score and applied on Tweets with Berlinale hashtag during the festival.
arXiv Detail & Related papers (2021-01-04T10:57:11Z) - Corpus Creation for Sentiment Analysis in Code-Mixed Tamil-English Text [0.9235531183915556]
We create a code-switched, sentiment-annotated corpus containing 15,744 comment posts from YouTube.
In this paper, we describe the process of creating the corpus and assigning polarities.
We present inter-annotator agreement and show the results of sentiment analysis trained on this corpus as a benchmark.
arXiv Detail & Related papers (2020-05-30T07:17:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.