Fugu-MT 論文翻訳(概要): Wisdom is Knowing What not to Say: Hallucination-Free LLMs Unlearning via Attention Shifting

論文の概要: Wisdom is Knowing What not to Say: Hallucination-Free LLMs Unlearning via Attention Shifting

arxiv url: http://arxiv.org/abs/2510.17210v1
Date: Mon, 20 Oct 2025 06:50:03 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:12.021298
Title: Wisdom is Knowing What not to Say: Hallucination-Free LLMs Unlearning via Attention Shifting
Title（参考訳）: Wisdomが言うべきことを知る: 意識転換による幻覚のないLLMの学習
Authors: Chenchen Tan, Youyang Qu, Xinghao Li, Hui Zhang, Shujie Cui, Cunjian Chen, Longxiang Gao,
Abstract要約: 選択的アンラーニングのためのAttention-Shifting(AS)フレームワークを導入する。 ASは,(1)LLMの言語構造を損なうことなく,事実を含むトークンへの注意を弱める文脈保存抑制,(2)未学習コンテンツについて問い合わせたときの完成度を損なう幻覚耐性応答の2つの設計目標によって駆動される。実験の結果、ASは最先端の未学習手法よりも性能を向上し、ToFUベンチマークでは最大15%、TDECベンチマークでは10%の精度を達成し、競合する幻覚のない未学習の有効性を維持した。
参考スコア（独自算出の注目度）: 11.725875396424927
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The increase in computing power and the necessity of AI-assisted decision-making boost the growing application of large language models (LLMs). Along with this, the potential retention of sensitive data of LLMs has spurred increasing research into machine unlearning. However, existing unlearning approaches face a critical dilemma: Aggressive unlearning compromises model utility, while conservative strategies preserve utility but risk hallucinated responses. This significantly limits LLMs' reliability in knowledge-intensive applications. To address this, we introduce a novel Attention-Shifting (AS) framework for selective unlearning. AS is driven by two design objectives: (1) context-preserving suppression that attenuates attention to fact-bearing tokens without disrupting LLMs' linguistic structure; and (2) hallucination-resistant response shaping that discourages fabricated completions when queried about unlearning content. AS realizes these objectives through two attention-level interventions, which are importance-aware suppression applied to the unlearning set to reduce reliance on memorized knowledge and attention-guided retention enhancement that reinforces attention toward semantically essential tokens in the retained dataset to mitigate unintended degradation. These two components are jointly optimized via a dual-loss objective, which forms a soft boundary that localizes unlearning while preserving unrelated knowledge under representation superposition. Experimental results show that AS improves performance preservation over the state-of-the-art unlearning methods, achieving up to 15% higher accuracy on the ToFU benchmark and 10% on the TDEC benchmark, while maintaining competitive hallucination-free unlearning effectiveness. Compared to existing methods, AS demonstrates a superior balance between unlearning effectiveness, generalization, and response reliability.
Abstract（参考訳）: 計算能力の増大とAI支援による意思決定の必要性により、大規模言語モデル(LLM)の適用が増加する。これに伴い、LLMの機密データの潜在的な保持は、機械学習の研究を加速させてきた。しかし、既存のアンラーニングアプローチは、攻撃的なアンラーニング妥協モデルユーティリティ、保守的な戦略は実用性を維持しつつも、リスクに満ちた応答を維持する、という、重大なジレンマに直面している。これは知識集約型アプリケーションにおけるLLMの信頼性を著しく制限する。これを解決するために,選択的アンラーニングのための新しいアテンション・シフト(AS)フレームワークを提案する。 ASは,(1)LLMの言語構造を損なうことなく,事実を含むトークンへの注意を弱める文脈保存抑制,(2)未学習コンテンツについて問い合わせたときの完成度を損なう幻覚耐性応答の2つの設計目標によって駆動される。 Asは、2つの注意レベルの介入を通じてこれらの目的を実現する。これは、記憶された知識への依存を減らし、注意を向けた保持強化を減らし、意図しない劣化を緩和するために保持されたデータセットにおける意味論的本質的なトークンへの注意を補強する。これら2つのコンポーネントは、二重ロスの目的によって共同最適化され、非学習をローカライズするソフト境界を形成し、非関連知識を表現重畳下で保存する。実験の結果,ASは最先端のアンラーニング手法の性能を向上し,ToFUベンチマークでは最大15%,TDECベンチマークでは10%の精度を達成し,競争力のある幻覚のないアンラーニングの有効性を維持した。既存の手法と比較して、ASは未学習の有効性、一般化、応答信頼性のバランスが優れている。

論文の概要: Wisdom is Knowing What not to Say: Hallucination-Free LLMs Unlearning via Attention Shifting

関連論文リスト