Fugu-MT 論文翻訳(概要): AtPatch: Debugging Transformers via Hot-Fixing Over-Attention

論文の概要: AtPatch: Debugging Transformers via Hot-Fixing Over-Attention

arxiv url: http://arxiv.org/abs/2601.21695v1
Date: Thu, 29 Jan 2026 13:29:35 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-30 16:22:49.846857
Title: AtPatch: Debugging Transformers via Hot-Fixing Over-Attention
Title（参考訳）: AtPatch: ホットフィクスオーバーアテンションによるトランスフォーマーのデバッグ
Authors: Shihao Weng, Yang Feng, Jincheng Li, Yining Yin, Xiaofei Xie, Jia Liu,
Abstract要約: トランスフォーマーベースのディープニューラルネットワーク(DNN)は、バックドア攻撃や不公平さの影響を受け、通常異常な注意パターンを示す。本研究では,モデル推論中にアテンションマップを動的に再分割するホットフィックス手法であるAtPatchを提案する。 AtPatchは、バックドア攻撃や不公平性を効果的に軽減し、モデルの本来の機能をよりよく保存する。
参考スコア（独自算出の注目度）: 25.63529551684826
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformer-based deep neural networks (DNNs) affected by backdoor attacks and unfairness typically exhibit anomalous attention patterns, leading to over-attend to backdoor triggers or protected attributes. Existing neuron-editing mitigation strategies often struggle to handle such situation and most of them lack flexibility and tend to distort feature representations. Motivated by such over-attention phenomenon and software engineering paradigms such as delta debugging and hot patching, we propose AtPatch, a hot-fix method that dynamically redistributes attention maps during model inference. Specifically, for a given input, AtPatch first extracts the attention map from the model's inference process. Then, it uses a pre-trained detector to identify anomalous columns and replace them with unified benign attention. Then, AtPatch rescales other columns to mitigate the impact of over-attention. Finally, AtPatch returns the redistributed attention map to the model for continued inference. Notably, if the detector does not report any anomalous columns, AtPatch directly returns the original attention map to the model. Unlike existing techniques, AtPatch selectively redistributes the attention map, making it better at preserving the model's original functionality. Furthermore, AtPatch's on-the-fly nature allows it to work without modifying model parameters or retraining, making it better suited for deployed models. We conducted extensive experiments to validate AtPatch. Experimental results show that, compared to existing methods, AtPatch can more effectively mitigate backdoor attacks and unfairness while better preserving the model's original functionality.
Abstract（参考訳）: トランスフォーマーベースのディープニューラルネットワーク(DNN)は、バックドア攻撃や不公平さの影響を受け、通常異常な注意パターンを示す。既存のニューロン編集の緩和戦略はしばしばそのような状況に対処するのに苦労し、そのほとんどは柔軟性に欠け、特徴表現を歪ませる傾向がある。このような過度な注意現象やデルタデバッギングやホットパッチといったソフトウェアエンジニアリングパラダイムに触発されて,モデル推論中の注意図を動的に再編集するホットフィックス手法であるAtPatchを提案する。具体的には、与えられた入力に対して、AtPatchはまずモデルの推論プロセスからアテンションマップを抽出する。そして、訓練済みの検出器を使って異常な柱を識別し、それらを統一された良心に置き換える。そして、AtPatchは他のカラムを再スケールして、過剰注意の影響を軽減する。最後に、AtPatchは継続的な推論のために再配布されたアテンションマップをモデルに返します。特に、検出器が異常な列を報告しない場合、AtPatchはオリジナルの注意マップを直接モデルに返します。既存の技術とは異なり、AtPatchはアテンションマップを選択的に再分割し、モデルの本来の機能を保存するのを良くする。さらに、AtPatchのオンザフライな性質は、モデルパラメータを変更したり、再トレーニングすることなく動作し、デプロイされたモデルに適している。 AtPatchを検証するために広範な実験を行った。実験結果から、既存の手法と比較して、AtPatchはバックドア攻撃や不公平性を効果的に軽減し、モデルの本来の機能をよりよく保存できることが示された。

論文の概要: AtPatch: Debugging Transformers via Hot-Fixing Over-Attention

関連論文リスト