NollySenti: Leveraging Transfer Learning and Machine Translation for
Nigerian Movie Sentiment Classification
- URL: http://arxiv.org/abs/2305.10971v2
- Date: Tue, 22 Aug 2023 07:25:43 GMT
- Title: NollySenti: Leveraging Transfer Learning and Machine Translation for
Nigerian Movie Sentiment Classification
- Authors: Iyanuoluwa Shode, David Ifeoluwa Adelani, Jing Peng, Anna Feldman
- Abstract summary: Africa has over 2000 indigenous languages but they are under-represented in NLP research due to lack of datasets.
We create a new dataset, NollySenti, based on the Nollywood movie reviews for five languages widely spoken in Nigeria (English, Hausa, Igbo, Nigerian-Pidgin, and Yoruba)
- Score: 10.18858070640917
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Africa has over 2000 indigenous languages but they are under-represented in
NLP research due to lack of datasets. In recent years, there have been progress
in developing labeled corpora for African languages. However, they are often
available in a single domain and may not generalize to other domains. In this
paper, we focus on the task of sentiment classification for cross domain
adaptation. We create a new dataset, NollySenti - based on the Nollywood movie
reviews for five languages widely spoken in Nigeria (English, Hausa, Igbo,
Nigerian-Pidgin, and Yoruba. We provide an extensive empirical evaluation using
classical machine learning methods and pre-trained language models. Leveraging
transfer learning, we compare the performance of cross-domain adaptation from
Twitter domain, and cross-lingual adaptation from English language. Our
evaluation shows that transfer from English in the same target domain leads to
more than 5% improvement in accuracy compared to transfer from Twitter in the
same language. To further mitigate the domain difference, we leverage machine
translation (MT) from English to other Nigerian languages, which leads to a
further improvement of 7% over cross-lingual evaluation. While MT to
low-resource languages are often of low quality, through human evaluation, we
show that most of the translated sentences preserve the sentiment of the
original English reviews.
Related papers
- Defining Boundaries: The Impact of Domain Specification on Cross-Language and Cross-Domain Transfer in Machine Translation [0.44601285466405083]
Cross-lingual transfer learning offers a promising solution for neural machine translation (NMT)
This paper focuses on the impact of domain specification and linguistic factors on transfer effectiveness.
We evaluate multiple target languages, including Portuguese, Italian, French, Czech, Polish, and Greek.
arXiv Detail & Related papers (2024-08-21T18:28:48Z) - Analysing Cross-Lingual Transfer in Low-Resourced African Named Entity
Recognition [0.10641561702689348]
We investigate the properties of cross-lingual transfer learning between ten low-resourced languages.
We find that models that perform well on a single language often do so at the expense of generalising to others.
The amount of data overlap between the source and target datasets is a better predictor of transfer performance than either the geographical or genetic distance between the languages.
arXiv Detail & Related papers (2023-09-11T08:56:47Z) - BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual
Transfer [81.5984433881309]
We introduce BUFFET, which unifies 15 diverse tasks across 54 languages in a sequence-to-sequence format.
BUFFET is designed to establish a rigorous and equitable evaluation framework for few-shot cross-lingual transfer.
Our findings reveal significant room for improvement in few-shot in-context cross-lingual transfer.
arXiv Detail & Related papers (2023-05-24T08:06:33Z) - Low-Resourced Machine Translation for Senegalese Wolof Language [0.34376560669160383]
We present a parallel Wolof/French corpus of 123,000 sentences on which we conducted experiments on machine translation models based on Recurrent Neural Networks (RNN)
We noted performance gains with the models trained on subworded data as well as those trained on the French-English language pair compared to those trained on the French-Wolof pair under the same experimental conditions.
arXiv Detail & Related papers (2023-05-01T00:04:19Z) - MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity
Recognition [55.95128479289923]
African languages are spoken by over a billion people, but are underrepresented in NLP research and development.
We create the largest human-annotated NER dataset for 20 African languages.
We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points.
arXiv Detail & Related papers (2022-10-22T08:53:14Z) - \`It\`ak\'ur\`oso: Exploiting Cross-Lingual Transferability for Natural
Language Generation of Dialogues in Low-Resource, African Languages [0.9511471519043974]
We investigate the possibility of cross-lingual transfer from a state-of-the-art (SoTA) deep monolingual model to 6 African languages.
The languages are Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda & Yorub'a.
The results show that the hypothesis that deep monolingual models learn some abstractions that generalise across languages holds.
arXiv Detail & Related papers (2022-04-17T20:23:04Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - Cross-lingual Transferring of Pre-trained Contextualized Language Models [73.97131976850424]
We propose a novel cross-lingual model transferring framework for PrLMs: TreLM.
To handle the symbol order and sequence length differences between languages, we propose an intermediate TRILayer" structure.
We show the proposed framework significantly outperforms language models trained from scratch with limited data in both performance and efficiency.
arXiv Detail & Related papers (2021-07-27T06:51:13Z) - From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual
Transfer with Multilingual Transformers [62.637055980148816]
Massively multilingual transformers pretrained with language modeling objectives have become a de facto default transfer paradigm for NLP.
We show that cross-lingual transfer via massively multilingual transformers is substantially less effective in resource-lean scenarios and for distant languages.
arXiv Detail & Related papers (2020-05-01T22:04:58Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.