Fugu-MT 論文翻訳(概要): Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts

論文の概要: Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts

arxiv url: http://arxiv.org/abs/2505.22582v1
Date: Wed, 28 May 2025 16:54:53 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-29 17:35:50.747173
Title: Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts
Title（参考訳）: 層ワイド・オブ・エクストラルトによるLLMの効率的な多言語拡張
Authors: Xue Zhang, Yunlong Liang, Fandong Meng, Songming Zhang, Yufeng Chen, Jinan Xu, Jie Zhou,
Abstract要約: そこで本研究では,各層に対する新たな専門家の適切な数を決定するために,レイヤワイズ・エキスパート・アロケーション・アルゴリズム(LayerMoE)を提案する。提案手法は, 従来の最先端のベースラインよりも60%少ない精度で性能を向上する。
参考スコア（独自算出の注目度）: 98.73585104789217
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Continually expanding new languages for existing large language models (LLMs) is a promising yet challenging approach to building powerful multilingual LLMs. The biggest challenge is to make the model continuously learn new languages while preserving the proficient ability of old languages. To achieve this, recent work utilizes the Mixture-of-Experts (MoE) architecture to expand new languages by adding new experts and avoid catastrophic forgetting of old languages by routing corresponding tokens to the original model backbone (old experts). Although intuitive, this kind of method is parameter-costly when expanding new languages and still inevitably impacts the performance of old languages. To address these limitations, we analyze the language characteristics of different layers in LLMs and propose a layer-wise expert allocation algorithm (LayerMoE) to determine the appropriate number of new experts for each layer. Specifically, we find different layers in LLMs exhibit different representation similarities between languages and then utilize the similarity as the indicator to allocate experts for each layer, i.e., the higher similarity, the fewer experts. Additionally, to further mitigate the forgetting of old languages, we add a classifier in front of the router network on the layers with higher similarity to guide the routing of old language tokens. Experimental results show that our method outperforms the previous state-of-the-art baseline with 60% fewer experts in the single-expansion setting and with 33.3% fewer experts in the lifelong-expansion setting, demonstrating the effectiveness of our method.
Abstract（参考訳）: 既存の大規模言語モデル(LLM)のための新しい言語を継続的に拡張することは、強力な多言語 LLM を構築する上で、有望だが挑戦的なアプローチである。最大の課題は、古い言語の熟練した能力を保ちながら、モデルを新しい言語を継続的に学習させることである。これを実現するため、最近の研究では、Mixture-of-Experts (MoE)アーキテクチャを使用して、新しい専門家を追加し、対応するトークンを元のモデルバックボーン(古いエキスパート)にルーティングすることで、古い言語の破滅的な忘れを避けることで、新しい言語を拡張する。直感的ではあるが、新しい言語を拡張する際にパラメータコストがかかるため、古い言語の性能に必然的に影響を及ぼす。これらの制約に対処するために,LLMの異なるレイヤの言語特性を解析し,各レイヤに対して適切な数の新たな専門家を決定するために,レイヤワイズ・エキスパート・アロケーション・アルゴリズム(LayerMoE)を提案する。具体的には、LLMの異なるレイヤが言語間で異なる表現類似性を示し、その類似性を利用して各レイヤのエキスパートを割り当てる。さらに、古い言語の忘れを緩和するために、古い言語のトークンのルーティングをガイドするために、より類似性の高い層にルータネットワークの前に分類器を追加します。実験結果から,本手法は1回の膨張条件では60%のエキスパートが,33.3%の長寿命膨張条件では,従来の最先端ベースラインよりも優れており,本手法の有効性が示された。

論文の概要: Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts

関連論文リスト