Lens: Rethinking Multilingual Enhancement for Large Language Models
- URL: http://arxiv.org/abs/2410.04407v1
- Date: Sun, 6 Oct 2024 08:51:30 GMT
- Title: Lens: Rethinking Multilingual Enhancement for Large Language Models
- Authors: Weixiang Zhao, Yulin Hu, Jiahe Guo, Xingyu Sui, Tongtong Wu, Yang Deng, Yanyan Zhao, Bing Qin, Wanxiang Che, Ting Liu,
- Abstract summary: Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs)
It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs.
It achieves superior results with much fewer computational resources compared to existing post-training approaches.
- Score: 70.85065197789639
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the growing global demand for large language models (LLMs) that serve users from diverse linguistic backgrounds, most cutting-edge LLMs remain predominantly English-centric. This creates a performance gap across languages, restricting access to advanced AI services for non-English speakers. Current methods to enhance multilingual capabilities largely rely on data-driven post-training techniques, such as multilingual instruction tuning or continual pre-training. However, these approaches encounter significant challenges, including the scarcity of high-quality multilingual datasets and the limited enhancement of multilingual capabilities. They often suffer from off-target issues and catastrophic forgetting of central language abilities. To this end, we propose Lens, a novel approach to enhance multilingual capabilities of LLMs by leveraging their internal language representation spaces. Specially, Lens operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs. Using the central language as a pivot, the target language is drawn closer to it within the language-agnostic subspace, allowing it to inherit well-established semantic representations. Meanwhile, in the language-specific subspace, the representations of the target and central languages are pushed apart, enabling the target language to express itself distinctly. Extensive experiments on one English-centric and two multilingual LLMs demonstrate that Lens effectively improves multilingual performance without sacrificing the original central language capabilities of the backbone model, achieving superior results with much fewer computational resources compared to existing post-training approaches.
Related papers
- Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs [20.756383171892608]
Middle layers exhibit strongest potential for cross-lingual alignment.
Experiments on slot filling, machine translation, and structured text generation show consistent improvements in cross-lingual transfer.
We show that separately trained alignment modules can be merged with existing task-specific modules, improving cross-lingual capabilities without full re-training.
arXiv Detail & Related papers (2025-02-20T18:45:43Z) - Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging - An Open Recipe [12.076338505539194]
This paper aims to enhance the reasoning capabilities of language-specific large language models (LLMs)
DeepSeek R1 excels in reasoning but primarily benefits high-resource languages such as English and Chinese.
Low-resource languages remain underserved due to the dominance of English-centric training data and model optimizations.
arXiv Detail & Related papers (2025-02-13T08:10:45Z) - LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models [89.13128402847943]
We present LUSIFER, a novel zero-shot approach that adapts LLM-based embedding models for multilingual tasks without requiring multilingual supervision.
LUSIFER's architecture combines a multilingual encoder, serving as a language-universal learner, with an LLM-based embedding model optimized for embedding-specific tasks.
We introduce a new benchmark encompassing 5 primary embedding tasks, 123 diverse datasets, and coverage across 14 languages.
arXiv Detail & Related papers (2025-01-01T15:43:07Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - Towards a More Inclusive AI: Progress and Perspectives in Large Language Model Training for the Sámi Language [7.289015788793582]
This work focuses on increasing technological participation for the S'ami language.
We draw the attention of the ML community towards the language modeling problem of Ultra Low Resource (ULR) languages.
We have compiled the available S'ami language resources from the web to create a clean dataset for training language models.
arXiv Detail & Related papers (2024-05-09T13:54:22Z) - Enhancing Multilingual Capabilities of Large Language Models through
Self-Distillation from Resource-Rich Languages [60.162717568496355]
Large language models (LLMs) have been pre-trained on multilingual corpora.
Their performance still lags behind in most languages compared to a few resource-rich languages.
arXiv Detail & Related papers (2024-02-19T15:07:32Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks.
We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset.
To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.