Fugu-MT 論文翻訳(概要): COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling

論文の概要: COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling

arxiv url: http://arxiv.org/abs/2604.20720v1
Date: Wed, 22 Apr 2026 16:07:10 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-23 15:36:11.215745
Title: COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling
Title（参考訳）: CompASS: 適応セマンティックサンプリングを用いたContinual Multilingual PEFT
Authors: Noah Flynn,
Abstract要約: ターゲット言語に大規模言語モデルを適用するための,データ中心のフレームワークを導入する。我々は、既存のトレーニングデータとターゲット利用分布のセマンティックギャップを特定するために、分布対応サンプリング戦略を用いる。我々はこれを継続的学習フレームワークに拡張し、本番環境でのデータ分散シフトを監視し、アダプタを動的に更新し、モデルの不安定さを防ぐ。
参考スコア（独自算出の注目度）: 0.023074632109535153
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) often exhibit performance disparities across languages, with naive multilingual fine-tuning frequently degrading performance due to negative cross-lingual interference. To address this, we introduce COMPASS (COntinual Multilingual PEFT with Adaptive Semantic Sampling), a novel data-centric framework for adapting LLMs to target languages. COMPASS leverages parameter-efficient fine-tuning (PEFT) by training lightweight, language-specific adapters on a judiciously selected subset of auxiliary multilingual data. The core of our method is a distribution-aware sampling strategy that uses multilingual embeddings and clustering to identify semantic gaps between existing training data and a target usage distribution. By prioritizing auxiliary data from under-represented semantic clusters, COMPASS maximizes positive cross-lingual transfer while minimizing interference. We extend this into a continual learning framework, COMPASS-ECDA, which monitors for data distribution shifts in production and dynamically updates adapters to prevent model staleness, balancing adaptation to new data with the preservation of existing knowledge. Across three different model architectures (Phi-4-Mini, Llama-3.1-8B, and Qwen2.5-7B) and multiple challenging multilingual benchmarks (Global-MMLU, MMLU-ProX), including unseen long-context tasks (OneRuler), we demonstrate that COMPASS consistently outperforms baseline methods guided by linguistic similarity, providing an effective, efficient, and sustainable solution for developing and maintaining high-performing multilingual models in dynamic environments.
Abstract（参考訳）: 大規模言語モデル(LLM)は、言語間での性能格差をしばしば示し、単純で多言語的な微調整は、負の言語間干渉によってしばしば劣化する。そこで本研究では,LLMを対象言語に適用するための新しいデータ中心フレームワークであるCompASS(Continual Multilingual PEFT with Adaptive Semantic Smpling)を紹介する。 CompASSは、補助的多言語データの部分集合に対して、軽量で言語固有のアダプタを訓練することにより、パラメータ効率の良い微調整(PEFT)を利用する。本手法の中核は,複数言語埋め込みとクラスタリングを用いて,既存のトレーニングデータとターゲット利用分布のセマンティックギャップを識別する分散対応サンプリング戦略である。表現されていないセマンティッククラスタから補助データを優先順位付けすることで、CompASSは干渉を最小限にしながら正の言語間転送を最大化する。我々はこれを継続学習フレームワークCompASS-ECDAに拡張し、生産におけるデータ分散シフトを監視し、アダプタを動的に更新し、モデルの安定化を防ぎ、新しいデータへの適応と既存の知識の保存を両立させる。 3つの異なるモデルアーキテクチャ (Phi-4-Mini, Llama-3.1-8B, Qwen2.5-7B) と複数の挑戦的マルチリンガル・ベンチマーク (Global-MMLU, MMLU-ProX) にまたがって, 言語的類似性によって導かれるベースライン手法を一貫して上回り, 動的環境における高性能マルチリンガル・モデルの開発と維持のための効率的で持続可能なソリューションを提供することを示した。

論文の概要: COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling

関連論文リスト