Fugu-MT 論文翻訳(概要): Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training

論文の概要: Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training

arxiv url: http://arxiv.org/abs/2604.24350v1
Date: Mon, 27 Apr 2026 11:44:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-28 17:12:07.955203
Title: Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training
Title（参考訳）: 急速対人訓練における破滅的オーバーフィッティングの背景に隠されたバックドア機構の解明
Authors: Mengnan Zhao, Lihe Zhang, Tianhang Zheng, Bo Wang, Baocai Yin,
Abstract要約: FAT(Fast Adversarial Training)は、敵攻撃に対するニューラルネットワークの堅牢性向上の効率性から、大きな注目を集めている。 FATは破滅的なオーバーフィッティング(CO)の傾向があり、訓練中に使用される特定の攻撃に過度に適合し、他者への一般化に失敗する。バックドアのレンズによるCOの系統的,直感的な説明を提案する。
参考スコア（独自算出の注目度）: 66.02119132131321
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fast Adversarial Training (FAT) has attracted significant attention due to its efficiency in enhancing neural network robustness against adversarial attacks. However, FAT is prone to catastrophic overfitting (CO), wherein models overfit to the specific attack used during training and fail to generalize to others. While existing methods introduce diverse hypotheses and propose various strategies to mitigate CO, a systematic and intuitive explanation of CO remains absent. In this work, we innovatively interpret CO through the lens of backdoor. Through validations on pathway division, diverse feature predictions, and universal class distinguishable triggers in CO, we conceptualize CO as a weak trigger variant of unlearnable tasks, unifying CO, backdoor attacks, and unlearnable tasks under a common theoretical framework. Guided by this, we leverage several backdoor inspired strategies to mitigate CO: (i) Recalibrate CO affected model parameters using vanilla fine tuning, linear probing, or reinitialization-based techniques; (ii) Introduce a weight outlier suppression constraint to regulate abnormal deviations in model weights. Extensive experiments support our interpretation of CO and show the efficacy of the proposed mitigation strategies.
Abstract（参考訳）: FAT(Fast Adversarial Training)は、敵攻撃に対するニューラルネットワークの堅牢性向上の効率性から、大きな注目を集めている。しかし、FATは破滅的なオーバーフィッティング(CO)の傾向があり、訓練中に使用される特定の攻撃に過度に適合するモデルが他者に一般化できない。既存の手法は様々な仮説を導入し、COを緩和するための様々な戦略を提案するが、体系的で直観的なCOの説明は残っていない。本研究では,バックドアのレンズを通してCOを革新的に解釈する。経路分割、多様な特徴予測、COにおける普遍クラス区別可能なトリガーの検証を通じて、我々はCOを、共通の理論的枠組みの下で、未学習タスクの弱いトリガー変種、COの統一、バックドアアタック、および未学習タスクとして概念化する。これを受けて、私たちはいくつかのバックドアインスパイアされた戦略を活用してCOを緩和します。 (i)バニラ微調整、線形探傷、再初期化に基づく手法を用いたCOの影響モデルパラメータの校正二模型重量の異常偏差を抑えるために、重量外圧抑制制約を導入すること。大規模な実験はCOの解釈を支持し,提案した緩和戦略の有効性を示す。

論文の概要: Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training

関連論文リスト