How good are Large Language Models on African Languages?
        - URL: http://arxiv.org/abs/2311.07978v2
- Date: Tue, 30 Apr 2024 16:04:16 GMT
- Title: How good are Large Language Models on African Languages?
- Authors: Jessica Ojo, Kelechi Ogueji, Pontus Stenetorp, David Ifeoluwa Adelani, 
- Abstract summary: We present an analysis of four popular large language models (mT0, Aya, LLaMa 2, and GPT-4) on six tasks across 60 African languages.
Our results suggest that all LLMs produce lower performance for African languages, and there is a large gap in performance compared to high-resource languages.
- Score: 18.660783984850845
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract:   Recent advancements in natural language processing have led to the proliferation of large language models (LLMs). These models have been shown to yield good performance, using in-context learning, even on tasks and languages they are not trained on. However, their performance on African languages is largely understudied relative to high-resource languages. We present an analysis of four popular large language models (mT0, Aya, LLaMa 2, and GPT-4) on six tasks (topic classification, sentiment classification, machine translation, summarization, question answering, and named entity recognition) across 60 African languages, spanning different language families and geographical regions. Our results suggest that all LLMs produce lower performance for African languages, and there is a large gap in performance compared to high-resource languages (such as English) for most tasks. We find that GPT-4 has an average to good performance on classification tasks, yet its performance on generative tasks such as machine translation and summarization is significantly lacking. Surprisingly, we find that mT0 had the best overall performance for cross-lingual QA, better than the state-of-the-art supervised model (i.e. fine-tuned mT5) and GPT-4 on African languages. Similarly, we find the recent Aya model to have comparable result to mT0 in almost all tasks except for topic classification where it outperform mT0. Overall, LLaMa 2 showed the worst performance, which we believe is due to its English and code-centric~(around 98%) pre-training corpus. Our findings confirms that performance on African languages continues to remain a hurdle for the current LLMs, underscoring the need for additional efforts to close this gap. 
 
      
        Related papers
        - The State of Large Language Models for African Languages: Progress and   Challenges [4.065633096286487]
 This paper comparatively analyzes African language coverage across six Large Language Models (LLMs), eight Small Language Models (SLMs), and six Specialized SLMs (SSLMs)<n>The evaluation covers language coverage, training sets, technical limitations, script problems, and language modelling roadmaps.
 arXiv  Detail & Related papers  (2025-06-02T21:39:40Z)
- Lugha-Llama: Adapting Large Language Models for African Languages [48.97516583523523]
 Large language models (LLMs) have achieved impressive results in a wide range of natural language applications.
We consider how to adapt LLMs to low-resource African languages.
We find that combining curated data from African languages with high-quality English educational texts results in a training mix that substantially improves the model's performance on these languages.
 arXiv  Detail & Related papers  (2025-04-09T02:25:53Z)
- Bridging the Gap: Enhancing LLM Performance for Low-Resource African   Languages with New Benchmarks, Fine-Tuning, and Cultural Adjustments [0.9214083577876088]
 This paper creates approximately 1 million human-translated words of new benchmark data in 8 low-resource African languages.
Our benchmarks are translations of Winogrande and three sections of MMLU: college medicine, clinical knowledge, and virology.
Using the benchmarks translated, we report previously unknown performance gaps between state-of-the-art (SOTA) LLMs in English and African languages.
 arXiv  Detail & Related papers  (2024-12-16T23:50:21Z)
- One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of   Large Language Models in Reasoning Tasks [55.35278531907263]
 We present the first study on Large Language Models' fairness and robustness to a dialect in canonical reasoning tasks.
We hire AAVE speakers to rewrite seven popular benchmarks, such as HumanEval and GSM8K.
We find that, compared to Standardized English, almost all of these widely used models show significant brittleness and unfairness to queries in AAVE.
 arXiv  Detail & Related papers  (2024-10-14T18:44:23Z)
- Do Large Language Models Speak All Languages Equally? A Comparative   Study in Low-Resource Settings [12.507989493130175]
 Large language models (LLMs) have garnered significant interest in natural language processing (NLP)
Recent studies have highlighted the limitations of LLMs in low-resource languages.
We present datasets for sentiment and hate speech tasks by translating from English to Bangla, Hindi, and Urdu.
 arXiv  Detail & Related papers  (2024-08-05T05:09:23Z)
- IrokoBench: A New Benchmark for African Languages in the Age of Large   Language Models [18.260317326787035]
 This paper introduces IrokoBench -- a human-translated benchmark dataset for 16 typologically-diverse low-resource African languages.
We use IrokoBench to evaluate zero-shot, few-shot, and translate-test settings(where test sets are translated into English) across 10 open and four proprietary language models.
We observe a significant performance gap between open and proprietary models, with the highest performing open model, Aya-101 only at 58% of the best-performing proprietary model GPT-4o performance.
 arXiv  Detail & Related papers  (2024-06-05T15:23:08Z)
- Zero-Shot Cross-Lingual Reranking with Large Language Models for
  Low-Resource Languages [51.301942056881146]
 We investigate how large language models (LLMs) function as rerankers in cross-lingual information retrieval systems for African languages.
Our implementation covers English and four African languages (Hausa, Somali, Swahili, and Yoruba)
We examine cross-lingual reranking with queries in English and passages in the African languages.
 arXiv  Detail & Related papers  (2023-12-26T18:38:54Z)
- Breaking Language Barriers in Multilingual Mathematical Reasoning:   Insights and Observations [59.056367787688146]
 This paper pioneers exploring and training powerful Multilingual Math Reasoning (xMR) LLMs.
We construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages.
By utilizing translation, we construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages.
 arXiv  Detail & Related papers  (2023-10-31T08:09:20Z)
- The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122   Language Variants [80.4837840962273]
 We present Belebele, a dataset spanning 122 language variants.
This dataset enables the evaluation of text models in high-, medium-, and low-resource languages.
 arXiv  Detail & Related papers  (2023-08-31T17:43:08Z)
- ChatGPT for Arabic Grammatical Error Correction [5.945320097465418]
 Large language models (LLMs) fine-tuned to follow human instruction have exhibited significant capabilities in English NLP tasks.
In this paper, we delve into abilities of instruction fine-tuned LLMs in Arabic GEC, a task made complex due to Arabic's rich morphology.
We find that instruction fine-tuned models, regardless of their size, significantly underperform compared to fully fine-tuned models of significantly smaller sizes.
 arXiv  Detail & Related papers  (2023-08-08T18:00:39Z)
- Democratizing LLMs for Low-Resource Languages by Leveraging their   English Dominant Abilities with Linguistically-Diverse Prompts [75.33019401706188]
 Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars.
We propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English.
Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages.
 arXiv  Detail & Related papers  (2023-06-20T08:27:47Z)
- How Good are Commercial Large Language Models on African Languages? [0.012691047660244334]
 We present a preliminary analysis of commercial large language models on two tasks (machine translation and text classification) across eight African languages.
Our results suggest that commercial language models produce below-par performance on African languages.
In general, our findings present a call-to-action to ensure African languages are well represented in commercial large language models.
 arXiv  Detail & Related papers  (2023-05-11T02:29:53Z)
- AfroLM: A Self-Active Learning-based Multilingual Pretrained Language
  Model for 23 African Languages [0.021987601456703476]
 We present AfroLM, a multilingual language model pretrained from scratch on 23 African languages.
AfroLM is pretrained on a dataset 14x smaller than existing baselines.
It is able to generalize well across various domains.
 arXiv  Detail & Related papers  (2022-11-07T02:15:25Z)
- Crosslingual Generalization through Multitask Finetuning [80.8822603322471]
 Multitask prompted finetuning (MTF) has been shown to help large language models generalize to new tasks in a zero-shot setting.
We apply MTF to the pretrained multilingual BLOOM and mT5 model families to produce finetuned variants called BLOOMZ and mT0.
We find finetuning large multilingual language models on English tasks with English prompts allows for task generalization to non-English languages.
 arXiv  Detail & Related papers  (2022-11-03T13:19:32Z)
- MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity
  Recognition [55.95128479289923]
 African languages are spoken by over a billion people, but are underrepresented in NLP research and development.
We create the largest human-annotated NER dataset for 20 African languages.
We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points.
 arXiv  Detail & Related papers  (2022-10-22T08:53:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.