Related papers: A Three-Pronged Approach to Cross-Lingual Adaptation with Multilingual LLMs

A Three-Pronged Approach to Cross-Lingual Adaptation with Multilingual LLMs

URL: http://arxiv.org/abs/2406.17377v1
Date: Tue, 25 Jun 2024 08:53:46 GMT
Title: A Three-Pronged Approach to Cross-Lingual Adaptation with Multilingual LLMs
Authors: Vaibhav Singh, Amrith Krishna, Karthika NJ, Ganesh Ramakrishnan,
Abstract summary: We study three approaches for cross-lingual transfer, under ICL and fine-tuning. We find that adding additional supervisory signals via a dominant language in the LLM, leads to improvements. Adapting the target languages to word reordering may be beneficial under ICL, but its impact diminishes with fine tuning.
Score: 21.49482900744541
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Low-resource languages, by its very definition, tend to be under represented in the pre-training corpora of Large Language Models. In this work, we investigate three low-resource cross-lingual approaches that enable an LLM adapt to tasks in previously unseen languages. Llama-2 is an LLM where Indic languages, among many other language families, contribute to less than $0.005\%$ of the total $2$ trillion token pre-training corpora. In this work, we experiment with the English-dominated Llama-2 for cross-lingual transfer to three Indic languages, Bengali, Hindi, and Tamil as target languages. We study three approaches for cross-lingual transfer, under ICL and fine-tuning. One, we find that adding additional supervisory signals via a dominant language in the LLM, leads to improvements, both under in-context learning and fine-tuning. Two, adapting the target languages to word reordering may be beneficial under ICL, but its impact diminishes with fine tuning. Finally, continued pre-training in one low-resource language can improve model performance for other related low-resource languages.

Related papers

Code-Switching Curriculum Learning for Multilingual Transfer in LLMs [43.85646680303273]
Large language models (LLMs) exhibit near human-level performance in various tasks, but their performance drops drastically after a handful of high-resource languages. Inspired by the human process of second language acquisition, we propose code-switching curriculum learning (CSCL) to enhance cross-lingual transfer for LLMs. CSCL mimics the stages of human language learning by progressively training models with a curriculum consisting of 1) token-level code-switching, 2) sentence-level code-switching, and 3) monolingual corpora.
arXiv Detail & Related papers (2024-11-04T06:31:26Z)
Language Imbalance Driven Rewarding for Multilingual Self-improving [35.1576728251478]
Large Language Models (LLMs) have achieved state-of-the-art performance across numerous tasks. This imbalance, while limiting broader applications, generates a natural preference ranking between languages. We propose $textitLanguage Imbalance Driven Rewarding$, where the inherent imbalance between dominant and non-dominant languages is leveraged as a reward signal.
arXiv Detail & Related papers (2024-10-11T16:32:05Z)
Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs) It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs. It achieves superior results with much fewer computational resources compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z)
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora. But can these models relate corresponding concepts across languages, effectively being crosslingual? This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z)
Teaching LLMs to Abstain across Languages via Multilingual Feedback [40.84205285309612]
We show that multilingual feedback helps identify knowledge gaps across diverse languages, cultures, and communities. Extensive experiments demonstrate that our multilingual feedback approach outperforms various strong baselines. Further analysis reveals that multilingual feedback is both an effective and a more equitable abstain strategy to serve diverse language speakers.
arXiv Detail & Related papers (2024-06-22T21:59:12Z)
Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners [67.85635044939836]
Large Language Models (LLMs) have shown impressive language capabilities. In this work, we investigate the spontaneous multilingual alignment improvement of LLMs. We find that LLMs instruction-tuned on the question translation data (i.e. without annotated answers) are able to encourage the alignment between English and a wide range of languages.
arXiv Detail & Related papers (2024-05-22T16:46:19Z)
Analyzing and Adapting Large Language Models for Few-Shot Multilingual NLU: Are We There Yet? [82.02076369811402]
Supervised fine-tuning (SFT), supervised instruction tuning (SIT) and in-context learning (ICL) are three alternative, de facto standard approaches to few-shot learning. We present an extensive and systematic comparison of the three approaches, testing them on 6 high- and low-resource languages, three different NLU tasks, and a myriad of language and domain setups. Our observations show that supervised instruction tuning has the best trade-off between performance and resource requirements.
arXiv Detail & Related papers (2024-03-04T10:48:13Z)
Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed? [40.13166574854085]
We investigate the minimal amount of multilinguality required to elicit cross-lingual generalisation in English-centric large language models. We find that multilingual instruction tuning with as few as two to three languages is both necessary and sufficient to elicit effective cross-lingual generalisation.
arXiv Detail & Related papers (2023-12-20T00:49:52Z)
Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts [75.33019401706188]
Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars. We propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English. Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages.
arXiv Detail & Related papers (2023-06-20T08:27:47Z)
InstructAlign: High-and-Low Resource Language Alignment via Continual Crosslingual Instruction Tuning [66.31509106146605]
Large language models (LLMs) that are tuned with instructions have demonstrated remarkable capabilities in various tasks and languages. However, their ability to generalize to underrepresented languages is limited due to the scarcity of available data. We propose InstructAlign which uses continual crosslingual instruction tuning to enable LLMs to align new unseen languages with previously learned high-resource languages.
arXiv Detail & Related papers (2023-05-23T02:51:34Z)
Multilingual Language Model Adaptive Fine-Tuning: A Study on African Languages [19.067718464786463]
We perform multilingual adaptive fine-tuning (MAFT) on 17 most-resourced African languages and three other high-resource languages widely spoken on the African continent. To further specialize the multilingual PLM, we removed vocabulary tokens from the embedding layer that corresponds to non-African writing scripts before MAFT. Our approach is competitive to applying LAFT on individual languages while requiring significantly less disk space.
arXiv Detail & Related papers (2022-04-13T16:13:49Z)
Cross-Lingual Ability of Multilingual Masked Language Models: A Study of Language Structure [54.01613740115601]
We study three language properties: constituent order, composition and word co-occurrence. Our main conclusion is that the contribution of constituent order and word co-occurrence is limited, while the composition is more crucial to the success of cross-linguistic transfer.
arXiv Detail & Related papers (2022-03-16T07:09:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.