Fugu-MT 論文翻訳(概要): Uncovering the Potential Risks in Unlearning: Danger of English-only Unlearning in Multilingual LLMs

論文の概要: Uncovering the Potential Risks in Unlearning: Danger of English-only Unlearning in Multilingual LLMs

arxiv url: http://arxiv.org/abs/2510.23949v1
Date: Tue, 28 Oct 2025 00:05:00 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-29 15:35:36.691512
Title: Uncovering the Potential Risks in Unlearning: Danger of English-only Unlearning in Multilingual LLMs
Title（参考訳）: アンラーニングの潜在的なリスクを明らかにする:多言語LLMにおける英語のみのアンラーニングの危険性
Authors: Kyomin Hwang, Hyeonjin Kim, Seungyeon Kim, Sunghyun Wee, Nojun Kwak,
Abstract要約: N-gram-based Language-Mix (N-Mix) スコアを導入し,多言語LLMにおいて言語混同が広範かつ一貫性があることを定量的に示す。 N-Mixスコアが高い場合,基準に基づく測定値が偽陰性となり,新しいタイプの未学習評価の必要性が示唆された。
参考スコア（独自算出の注目度）: 29.69282972994522
License: http://creativecommons.org/licenses/by/4.0/
Abstract: There have been a couple of studies showing that attempting to erase multilingual knowledge using only English data is insufficient for multilingual LLMs. However, their analyses remain highly performance-oriented. In this paper, we switch the point of view to evaluation, and address an additional blind spot which reveals itself when the multilingual LLM is fully finetuned with parallel multilingual dataset before unlearning. Here, language confusion occurs whereby a model responds in language different from that of the input prompt. Language confusion is a problematic phenomenon in unlearning, causing the standard reference-based metrics to fail. We tackle this phenomenon in three steps: (1) introduce N-gram-based Language-Mix (N-Mix) score to quantitatively show the language confusion is pervasive and consistent in multilingual LLMs, (2) demonstrate that reference-based metrics result in false negatives when N-Mix score is high, and(3) suggest the need of new type of unlearning evaluation that can directly assess the content of the generated sentences. We call this type of metrics as semantic-based metric.
Abstract（参考訳）: 英語データのみを用いて多言語知識を消去しようとする試みは多言語LLMには不十分であることを示す研究がいくつかある。しかし、それらの分析は依然として高いパフォーマンスを指向している。本稿では、視点を評価に切り替え、学習前に多言語 LLM が並列多言語データセットで完全に微調整されたときに現れる盲点に対処する。ここでは、入力プロンプトと異なる言語でモデルが応答する言語混乱が発生する。言語の混乱は、アンラーニングにおいて問題となる現象であり、標準基準ベースのメトリクスが失敗する。我々は,(1)N-gram-based Language-Mix (N-Mix) スコアを導入し,多言語LLMにおいて言語混同が広範かつ一貫性があることを定量的に示すこと,(2)N-Mix スコアが高い場合に基準ベースのメトリクスが偽陰性をもたらすこと,(3) 生成された文の内容を直接評価できる新しいタイプの未学習評価の必要性を示唆すること,の3つのステップに対処する。このタイプのメトリクスを意味ベースのメトリクスと呼びます。

論文の概要: Uncovering the Potential Risks in Unlearning: Danger of English-only Unlearning in Multilingual LLMs

関連論文リスト