Analyzing and Adapting Large Language Models for Few-Shot Multilingual
NLU: Are We There Yet?
- URL: http://arxiv.org/abs/2403.01929v1
- Date: Mon, 4 Mar 2024 10:48:13 GMT
- Title: Analyzing and Adapting Large Language Models for Few-Shot Multilingual
NLU: Are We There Yet?
- Authors: Evgeniia Razumovskaia, Ivan Vuli\'c, Anna Korhonen
- Abstract summary: Supervised fine-tuning (SFT), supervised instruction tuning (SIT) and in-context learning (ICL) are three alternative, de facto standard approaches to few-shot learning.
We present an extensive and systematic comparison of the three approaches, testing them on 6 high- and low-resource languages, three different NLU tasks, and a myriad of language and domain setups.
Our observations show that supervised instruction tuning has the best trade-off between performance and resource requirements.
- Score: 82.02076369811402
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Supervised fine-tuning (SFT), supervised instruction tuning (SIT) and
in-context learning (ICL) are three alternative, de facto standard approaches
to few-shot learning. ICL has gained popularity recently with the advent of
LLMs due to its simplicity and sample efficiency. Prior research has conducted
only limited investigation into how these approaches work for multilingual
few-shot learning, and the focus so far has been mostly on their performance.
In this work, we present an extensive and systematic comparison of the three
approaches, testing them on 6 high- and low-resource languages, three different
NLU tasks, and a myriad of language and domain setups. Importantly, performance
is only one aspect of the comparison, where we also analyse the approaches
through the optics of their computational, inference and financial costs. Our
observations show that supervised instruction tuning has the best trade-off
between performance and resource requirements. As another contribution, we
analyse the impact of target language adaptation of pretrained LLMs and find
that the standard adaptation approaches can (superficially) improve target
language generation capabilities, but language understanding elicited through
ICL does not improve and remains limited, with low scores especially for
low-resource languages.
Related papers
- Improving In-Context Learning with Small Language Model Ensembles [2.3499129784547654]
In-context learning (ICL) is a cheap and efficient alternative but cannot match the accuracies of advanced methods.
We present Ensemble SuperICL, a novel approach that enhances ICL by leveraging the expertise of multiple fine-tuned small language models (SLMs)
arXiv Detail & Related papers (2024-10-29T09:02:37Z) - Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual Intervention [71.12193680015622]
Large Language Models (LLMs) have shown remarkable capabilities in natural language processing.
LLMs exhibit significant performance gaps among different languages.
We propose Inference-Time Cross-Lingual Intervention (INCLINE) to overcome these limitations without incurring significant costs.
arXiv Detail & Related papers (2024-10-16T11:23:03Z) - Exploring Design Choices for Building Language-Specific LLMs [36.32622880071991]
We study building language-specific language models by adapting monolingual and multilingual models.
We find that the initial performance of LLM does not always correlate with the final performance after the adaptation.
arXiv Detail & Related papers (2024-06-20T18:47:43Z) - Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking [1.3716808114696444]
Large Language Models (LLMs) are becoming crucial across various fields, emphasizing the urgency for high-quality models in underrepresented languages.
This study explores the unique challenges faced by low-resource languages, such as data scarcity, model selection, evaluation, and computational limitations.
arXiv Detail & Related papers (2024-05-07T21:58:45Z) - Prefix Text as a Yarn: Eliciting Non-English Alignment in Foundation Language Model [50.339632513018934]
supervised fine-tuning (SFT) has been a straightforward approach for tailoring the output of foundation large language model (LLM) to specific preferences.
We critically examine this hypothesis within the scope of cross-lingual generation tasks.
We introduce a novel training-free alignment method named PreTTY, which employs minimal task-related prior tokens.
arXiv Detail & Related papers (2024-04-25T17:19:36Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - LLMs Are Few-Shot In-Context Low-Resource Language Learners [59.74451570590808]
In-context learning (ICL) empowers large language models (LLMs) to perform diverse tasks in underrepresented languages.
We extensively study ICL and its cross-lingual variation (X-ICL) on 25 low-resource and 7 relatively higher-resource languages.
Our study concludes the significance of few-shot in-context information on enhancing the low-resource understanding quality of LLMs.
arXiv Detail & Related papers (2024-03-25T07:55:29Z) - FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks.
We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z) - Breaking Language Barriers with a LEAP: Learning Strategies for Polyglot
LLMs [5.682384717239095]
Large language models (LLMs) are at the forefront of transforming numerous domains globally.
This paper tackles the imperative challenge of enhancing the multilingual performance of LLMs.
We present novel techniques that unlock the true potential of LLMs in a polyglot landscape.
arXiv Detail & Related papers (2023-05-28T14:48:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.