UBC-DLNLP at SemEval-2023 Task 12: Impact of Transfer Learning on
African Sentiment Analysis
- URL: http://arxiv.org/abs/2304.11256v2
- Date: Tue, 25 Apr 2023 07:44:37 GMT
- Title: UBC-DLNLP at SemEval-2023 Task 12: Impact of Transfer Learning on
African Sentiment Analysis
- Authors: Gagan Bhatia, Ife Adebara, AbdelRahim Elmadany, Muhammad Abdul-Mageed
- Abstract summary: We tackle the task of sentiment analysis in 14 different African languages.
We develop both monolingual and multilingual models under a full supervised setting.
Our results demonstrate the effectiveness of transfer learning and fine-tuning techniques for sentiment analysis.
- Score: 5.945320097465418
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We describe our contribution to the SemEVAl 2023 AfriSenti-SemEval shared
task, where we tackle the task of sentiment analysis in 14 different African
languages. We develop both monolingual and multilingual models under a full
supervised setting (subtasks A and B). We also develop models for the zero-shot
setting (subtask C). Our approach involves experimenting with transfer learning
using six language models, including further pertaining of some of these models
as well as a final finetuning stage. Our best performing models achieve an
F1-score of 70.36 on development data and an F1-score of 66.13 on test data.
Unsurprisingly, our results demonstrate the effectiveness of transfer learning
and fine-tuning techniques for sentiment analysis across multiple languages.
Our approach can be applied to other sentiment analysis tasks in different
languages and domains.
Related papers
- HYBRINFOX at CheckThat! 2024 -- Task 1: Enhancing Language Models with Structured Information for Check-Worthiness Estimation [0.8083061106940517]
This paper summarizes the experiments and results of the HYBRINFOX team for the CheckThat! 2024 - Task 1 competition.
We propose an approach enriching Language Models such as RoBERTa with embeddings produced by triples.
arXiv Detail & Related papers (2024-07-04T11:33:54Z) - The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance.
Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes.
We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z) - LLaMA Beyond English: An Empirical Study on Language Capability Transfer [49.298360366468934]
We focus on how to effectively transfer the capabilities of language generation and following instructions to a non-English language.
We analyze the impact of key factors such as vocabulary extension, further pretraining, and instruction tuning on transfer.
We employ four widely used standardized testing benchmarks: C-Eval, MMLU, AGI-Eval, and GAOKAO-Bench.
arXiv Detail & Related papers (2024-01-02T06:29:02Z) - Sentiment Analysis Across Multiple African Languages: A Current
Benchmark [5.701291200264771]
An annotated sentiment analysis of 14 African languages was made available.
We benchmarked and compared current state-of-art transformer models across 12 languages.
Our results show that despite work in low resource modeling, more data still produces better models on a per-language basis.
arXiv Detail & Related papers (2023-10-21T21:38:06Z) - BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual
Transfer [81.5984433881309]
We introduce BUFFET, which unifies 15 diverse tasks across 54 languages in a sequence-to-sequence format.
BUFFET is designed to establish a rigorous and equitable evaluation framework for few-shot cross-lingual transfer.
Our findings reveal significant room for improvement in few-shot in-context cross-lingual transfer.
arXiv Detail & Related papers (2023-05-24T08:06:33Z) - DN at SemEval-2023 Task 12: Low-Resource Language Text Classification
via Multilingual Pretrained Language Model Fine-tuning [0.0]
Most existing models and datasets for sentiment analysis are developed for high-resource languages, such as English and Chinese.
The AfriSenti-SemEval 2023 Shared Task 12 aims to fill this gap by evaluating sentiment analysis models on low-resource African languages.
We present our solution to the shared task, where we employed different multilingual XLM-R models with classification head trained on various data.
arXiv Detail & Related papers (2023-05-04T07:28:45Z) - NLNDE at SemEval-2023 Task 12: Adaptive Pretraining and Source Language
Selection for Low-Resource Multilingual Sentiment Analysis [11.05909046179595]
This paper describes our system developed for the SemEval-2023 Task 12 "Sentiment Analysis for Low-resource African languages using Twitter dataset"
Our key findings are: Adapting the pretrained model to the target language and task using a small yet relevant corpus improves performance remarkably by more than 10 F1 score points.
In the shared task, our system wins 8 out of 15 tracks and, in particular, performs best in the multilingual evaluation.
arXiv Detail & Related papers (2023-04-28T21:02:58Z) - Masakhane-Afrisenti at SemEval-2023 Task 12: Sentiment Analysis using
Afro-centric Language Models and Adapters for Low-resource African Languages [0.0]
The task aims to perform monolingual sentiment classification (sub-task A) for 12 African languages, multilingual sentiment classification (sub-task B) and zero-shot sentiment classification (task C)
Our findings suggest that using pre-trained Afro-centric language models improves performance for low-resource African languages.
We also ran experiments using adapters for zero-shot tasks, and the results suggest that we can obtain promising results by using adapters with a limited amount of resources.
arXiv Detail & Related papers (2023-04-13T12:54:29Z) - MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity
Recognition [55.95128479289923]
African languages are spoken by over a billion people, but are underrepresented in NLP research and development.
We create the largest human-annotated NER dataset for 20 African languages.
We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points.
arXiv Detail & Related papers (2022-10-22T08:53:14Z) - Few-shot Learning with Multilingual Language Models [66.49496434282564]
We train multilingual autoregressive language models on a balanced corpus covering a diverse set of languages.
Our largest model sets new state of the art in few-shot learning in more than 20 representative languages.
We present a detailed analysis of where the model succeeds and fails, showing in particular that it enables cross-lingual in-context learning.
arXiv Detail & Related papers (2021-12-20T16:52:35Z) - AmericasNLI: Evaluating Zero-shot Natural Language Understanding of
Pretrained Multilingual Models in Truly Low-resource Languages [75.08199398141744]
We present AmericasNLI, an extension of XNLI (Conneau et al.), to 10 indigenous languages of the Americas.
We conduct experiments with XLM-R, testing multiple zero-shot and translation-based approaches.
We find that XLM-R's zero-shot performance is poor for all 10 languages, with an average performance of 38.62%.
arXiv Detail & Related papers (2021-04-18T05:32:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.