Fugu-MT 論文翻訳(概要): Language Confusion Gate: Language-Aware Decoding Through Model Self-Distillation

論文の概要: Language Confusion Gate: Language-Aware Decoding Through Model Self-Distillation

arxiv url: http://arxiv.org/abs/2510.17555v1
Date: Mon, 20 Oct 2025 14:02:37 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:12.116926
Title: Language Confusion Gate: Language-Aware Decoding Through Model Self-Distillation
Title（参考訳）: 言語融合ゲート:モデル自己拡張による言語認識デコーディング
Authors: Collin Zhang, Fei Huang, Chenhan Yuan, Junyang Lin,
Abstract要約: 本稿では,デコード時にトークンをフィルタリングする軽量なプラグインソリューションであるLanguage Confusion Gate (LCG)を紹介する。 LCGは、標準調整自己蒸留を用いて訓練され、適切な言語ファミリーを予測し、必要に応じてマスクを適用する。
参考スコア（独自算出の注目度）: 50.93756215410832
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) often experience language confusion, which is the unintended mixing of languages during text generation. Current solutions to this problem either necessitate model retraining or cannot differentiate between harmful confusion and acceptable code-switching. This paper introduces the Language Confusion Gate (LCG), a lightweight, plug-in solution that filters tokens during decoding without altering the base LLM. The LCG is trained using norm-adjusted self-distillation to predict appropriate language families and apply masking only when needed. Our method is based on the findings that language confusion is infrequent, correct-language tokens are usually among the top predictions, and output token embedding norms are larger for high-resource languages, which biases sampling. When evaluated across various models, including Qwen3, GPT-OSS, Gemma3, Llama3.1, LCG decreases language confusion significantly, often by an order of magnitude, without negatively impacting task performance. Code is available at https://github.com/collinzrj/language_confusion_gate.
Abstract（参考訳）: 大きな言語モデル(LLM)は、しばしば、テキスト生成中に意図しない混合言語である言語混乱を経験する。この問題に対する現在の解決策は、モデルの再訓練を必要とするか、有害な混乱と許容されるコードスイッチングを区別できないかのいずれかである。本稿では,Language Confusion Gate (LCG)について紹介する。 LCGは、標準調整自己蒸留を用いて訓練され、適切な言語ファミリーを予測し、必要に応じてマスクを適用する。提案手法は,言語混乱の頻度が低いこと,正しい言語トークンが最上位の予測対象であること,高リソース言語では出力トークン埋め込みノルムがより大きいこと,サンプリングにバイアスがかかること,などに基づく。 Qwen3、GPT-OSS、Gemma3、Llama3.1など様々なモデルで評価されると、LCGはタスクのパフォーマンスに悪影響を及ぼすことなく、しばしば桁違いに言語を混乱させる。コードはhttps://github.com/collinzrj/lang_confusion_gateで公開されている。

論文の概要: Language Confusion Gate: Language-Aware Decoding Through Model Self-Distillation

関連論文リスト