SemEval-2023 Task 12: Sentiment Analysis for African Languages
(AfriSenti-SemEval)
- URL: http://arxiv.org/abs/2304.06845v2
- Date: Mon, 1 May 2023 10:18:04 GMT
- Title: SemEval-2023 Task 12: Sentiment Analysis for African Languages
(AfriSenti-SemEval)
- Authors: Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Seid Muhie Yimam, David
Ifeoluwa Adelani, Ibrahim Sa'id Ahmad, Nedjma Ousidhoum, Abinew Ayele, Saif
M. Mohammad, Meriem Beloucif, Sebastian Ruder
- Abstract summary: AfriSenti-SemEval is a sentiment classification challenge in 14 African languages.
We present three subtasks: (1) Task A: monolingual classification, which received 44 submissions; (2) Task B: multilingual classification, which received 32 submissions; and (3) Task C: zero-shot classification, which received 34 submissions.
- Score: 42.140064297754634
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present the first Africentric SemEval Shared task, Sentiment Analysis for
African Languages (AfriSenti-SemEval) - The dataset is available at
https://github.com/afrisenti-semeval/afrisent-semeval-2023. AfriSenti-SemEval
is a sentiment classification challenge in 14 African languages: Amharic,
Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican
Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and
Yor\`ub\'a (Muhammad et al., 2023), using data labeled with 3 sentiment
classes. We present three subtasks: (1) Task A: monolingual classification,
which received 44 submissions; (2) Task B: multilingual classification, which
received 32 submissions; and (3) Task C: zero-shot classification, which
received 34 submissions. The best performance for tasks A and B was achieved by
NLNDE team with 71.31 and 75.06 weighted F1, respectively. UCAS-IIE-NLP
achieved the best average score for task C with 58.15 weighted F1. We describe
the various approaches adopted by the top 10 systems and their approaches.
Related papers
- SemEval-2024 Task 1: Semantic Textual Relatedness for African and Asian Languages [39.770050337720676]
We present the first shared task on Semantic Textual Relatedness (STR)
We investigate the broader phenomenon of semantic relatedness across 14 languages.
These languages originate from five distinct language families and are predominantly spoken in Africa and Asia.
arXiv Detail & Related papers (2024-03-27T18:30:26Z) - HausaNLP at SemEval-2023 Task 12: Leveraging African Low Resource
TweetData for Sentiment Analysis [0.0]
We present the findings of SemEval-2023 Task 12, a shared task on sentiment analysis for low-resource African languages using Twitter dataset.
Our goal is to leverage low-resource tweet data using pre-trained Afro-xlmr-large, AfriBERTa-Large, Bert-base-arabic-camelbert-da-sentiment (Arabic-camelbert), Multilingual-BERT (mBERT) and BERT models for sentiment analysis of 14 African languages.
arXiv Detail & Related papers (2023-04-26T15:47:50Z) - Masakhane-Afrisenti at SemEval-2023 Task 12: Sentiment Analysis using
Afro-centric Language Models and Adapters for Low-resource African Languages [0.0]
The task aims to perform monolingual sentiment classification (sub-task A) for 12 African languages, multilingual sentiment classification (sub-task B) and zero-shot sentiment classification (task C)
Our findings suggest that using pre-trained Afro-centric language models improves performance for low-resource African languages.
We also ran experiments using adapters for zero-shot tasks, and the results suggest that we can obtain promising results by using adapters with a limited amount of resources.
arXiv Detail & Related papers (2023-04-13T12:54:29Z) - AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages [45.88640066767242]
Africa is home to over 2,000 languages from more than six language families and has the highest linguistic diversity among all continents.
Yet, there is little NLP research conducted on African languages. Crucial to enabling such research is the availability of high-quality annotated datasets.
In this paper, we introduce AfriSenti, a sentiment analysis benchmark that contains a total of >110,000 tweets in 14 African languages.
arXiv Detail & Related papers (2023-02-17T15:40:12Z) - Overview of the HASOC Subtrack at FIRE 2022: Offensive Language
Identification in Marathi [15.466844451996051]
The HASOC (Hate Speech and Offensive Content Identification) shared task is one of these initiatives.
In its fourth iteration, HASOC 2022 included three subtracks for English, Hindi, and Marathi.
We report the results of the HASOC 2022 Marathi subtrack which provided participants with a dataset containing data from Twitter manually annotated using the popular OLID taxonomy.
The best performing algorithms were a mixture of traditional and deep learning approaches.
arXiv Detail & Related papers (2022-11-18T11:17:15Z) - MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity
Recognition [55.95128479289923]
African languages are spoken by over a billion people, but are underrepresented in NLP research and development.
We create the largest human-annotated NER dataset for 20 African languages.
We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points.
arXiv Detail & Related papers (2022-10-22T08:53:14Z) - MIA 2022 Shared Task: Evaluating Cross-lingual Open-Retrieval Question
Answering for 16 Diverse Languages [54.002969723086075]
We evaluate cross-lingual open-retrieval question answering systems in 16 typologically diverse languages.
The best system leveraging iteratively mined diverse negative examples achieves 32.2 F1, outperforming our baseline by 4.5 points.
The second best system uses entity-aware contextualized representations for document retrieval, and achieves significant improvements in Tamil (20.8 F1), whereas most of the other systems yield nearly zero scores.
arXiv Detail & Related papers (2022-07-02T06:54:10Z) - Comprehensive Benchmark Datasets for Amharic Scene Text Detection and
Recognition [56.048783994698425]
Ethiopic/Amharic script is one of the oldest African writing systems, which serves at least 23 languages in East Africa.
The Amharic writing system, Abugida, has 282 syllables, 15 punctuation marks, and 20 numerals.
We presented the first comprehensive public datasets named HUST-ART, HUST-AST, ABE, and Tana for Amharic script detection and recognition in the natural scene.
arXiv Detail & Related papers (2022-03-23T03:19:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.