Comparing LLM prompting with Cross-lingual transfer performance on Indigenous and Low-resource Brazilian Languages
- URL: http://arxiv.org/abs/2404.18286v2
- Date: Tue, 30 Apr 2024 10:00:44 GMT
- Title: Comparing LLM prompting with Cross-lingual transfer performance on Indigenous and Low-resource Brazilian Languages
- Authors: David Ifeoluwa Adelani, A. Seza Doğruöz, André Coneglian, Atul Kr. Ojha,
- Abstract summary: We focus on 12 low-resource languages (LRLs) from Brazil, 2 LRLs from Africa and 2 high-resource languages (HRLs)
Our results indicate that the LLMs perform worse for the part of speech (POS) labeling of LRLs in comparison to HRLs.
- Score: 5.473562965178709
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models are transforming NLP for a variety of tasks. However, how LLMs perform NLP tasks for low-resource languages (LRLs) is less explored. In line with the goals of the AmericasNLP workshop, we focus on 12 LRLs from Brazil, 2 LRLs from Africa and 2 high-resource languages (HRLs) (e.g., English and Brazilian Portuguese). Our results indicate that the LLMs perform worse for the part of speech (POS) labeling of LRLs in comparison to HRLs. We explain the reasons behind this failure and provide an error analysis through examples observed in our data set.
Related papers
- Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language.
Currently, instruction-tuned large language models (LLMs) excel at various English tasks.
Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z) - Better to Ask in English: Evaluation of Large Language Models on English, Low-resource and Cross-Lingual Settings [12.507989493130175]
GPT-4, Llama 2, and Gemini are evaluated for their effectiveness in English compared to other low-resource languages from South Asia.
Findings suggest GPT-4 outperformed Llama 2 and Gemini in all five prompt settings and across all languages.
arXiv Detail & Related papers (2024-10-17T02:12:30Z) - Quality or Quantity? On Data Scale and Diversity in Adapting Large Language Models for Low-Resource Translation [62.202893186343935]
We explore what it would take to adapt Large Language Models for low-resource languages.
We show that parallel data is critical during both pre-training andSupervised Fine-Tuning (SFT)
Our experiments with three LLMs across two low-resourced language groups reveal consistent trends, underscoring the generalizability of our findings.
arXiv Detail & Related papers (2024-08-23T00:59:38Z) - Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLMs [45.44796295841526]
Large Language Models (LLMs) exhibit impressive zero/few-shot inference and generation quality for high-resource languages (HRLs)
A few of them have been trained on low-resource languages (LRLs) and give decent performance.
We show that LRLs are at a pricing disadvantage, because the well-known LLMs produce more tokens for LRLs than HRLs.
arXiv Detail & Related papers (2024-03-08T16:37:36Z) - Introducing Bode: A Fine-Tuned Large Language Model for Portuguese
Prompt-Based Task [1.158680734110387]
This work proposes a fine-tuned LLaMA 2-based model for Portuguese prompts named Bode.
We evaluate the performance of this model in classification tasks using the zero-shot approach with in-context learning.
arXiv Detail & Related papers (2024-01-05T17:15:01Z) - Zero-Shot Cross-Lingual Reranking with Large Language Models for
Low-Resource Languages [51.301942056881146]
We investigate how large language models (LLMs) function as rerankers in cross-lingual information retrieval systems for African languages.
Our implementation covers English and four African languages (Hausa, Somali, Swahili, and Yoruba)
We examine cross-lingual reranking with queries in English and passages in the African languages.
arXiv Detail & Related papers (2023-12-26T18:38:54Z) - Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts [75.33019401706188]
Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars.
We propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English.
Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages.
arXiv Detail & Related papers (2023-06-20T08:27:47Z) - Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis [103.89753784762445]
Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT)
This paper systematically investigates the advantages and challenges of LLMs for MMT.
We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4.
arXiv Detail & Related papers (2023-04-10T15:51:30Z) - Overlap-based Vocabulary Generation Improves Cross-lingual Transfer
Among Related Languages [18.862296065737347]
We argue that relatedness among languages in a language family along the dimension of lexical overlap may be leveraged to overcome some of the corpora limitations of LRLs.
We propose Overlap BPE, a simple yet effective modification to the BPE vocabulary generation algorithm which enhances overlap across related languages.
arXiv Detail & Related papers (2022-03-03T19:35:24Z) - Exploiting Language Relatedness for Low Web-Resource Language Model
Adaptation: An Indic Languages Study [14.34516262614775]
We argue that relatedness among languages in a language family may be exploited to overcome some of the corpora limitations of LRLs.
We focus on Indian languages, and exploit relatedness along two dimensions: (1) script (since many Indic scripts originated from the Brahmic script) and (2) sentence structure.
arXiv Detail & Related papers (2021-06-07T20:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.