Fugu-MT 論文翻訳(概要): Layer-wise Swapping for Generalizable Multilingual Safety

論文の概要: Layer-wise Swapping for Generalizable Multilingual Safety

arxiv url: http://arxiv.org/abs/2601.22620v1
Date: Fri, 30 Jan 2026 06:22:02 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-02 18:28:15.267267
Title: Layer-wise Swapping for Generalizable Multilingual Safety
Title（参考訳）: 汎用多言語安全のための層ワイドスワッピング
Authors: Hyunseo Shin, Wonseok Hwang,
Abstract要約: 既存の安全データセットは主に英語中心であり、多言語安全アライメントの進歩を制限する。本稿では、英語の安全専門家から低リソース言語専門家への安全アライメントを追加訓練なしで伝達する安全意識層スワップ手法を提案する。
参考スコア（独自算出の注目度）: 8.658596218544773
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite the rapid advancements of Large Language Models (LLMs), safety risks remain a critical challenge for low-resource languages. Existing safety datasets are predominantly English centric, limiting progress in multilingual safety alignment. As a result, low resource expert models, finetuned on their respective instruction datasets, tend to exhibit higher unsafety rates compared to their high resource counterparts. In this work, we propose a safety aware layer swapping method that transfers safety alignment from an English safety expert to low resource language experts without additional training. To further enhance transfer ability, our method adaptively selects or blends modules based on their degree of specialization. Our approach preserves performance on general language understanding tasks while enhancing safety in the target languages. Experimental results show that the proposed method achieves comparable performance to the language expert on general benchmarks such as MMMLU, BELEBELE, and MGSM, while producing more aligned and less harmful responses on the MultiJail safety benchmark.
Abstract（参考訳）: LLM(Large Language Models)の急速な進歩にもかかわらず、安全リスクは低リソース言語にとって重要な課題である。既存の安全データセットは主に英語中心であり、多言語安全アライメントの進歩を制限する。その結果、各命令データセットを微調整した低リソースエキスパートモデルでは、高リソースモデルと比較して安全性が低い傾向にある。本研究では、英語の安全専門家から低リソース言語専門家への安全アライメントを追加トレーニングなしで伝達する安全意識層スワップ方式を提案する。転送能力を高めるため,本手法はモジュールを適応的に選択またはブレンドする。提案手法は,汎用言語理解タスクの性能を維持しつつ,対象言語の安全性を向上する。実験の結果,MMMLU,BELEBELE,MGSMなどの一般的なベンチマークでは,MultiJailの安全性ベンチマークでは,より整合性が高く,有害な応答が得られない。

論文の概要: Layer-wise Swapping for Generalizable Multilingual Safety

関連論文リスト