Fugu-MT 論文翻訳(概要): WARP: Guaranteed Inner-Layer Repair of NLP Transformers

論文の概要: WARP: Guaranteed Inner-Layer Repair of NLP Transformers

arxiv url: http://arxiv.org/abs/2604.00938v1
Date: Wed, 01 Apr 2026 14:12:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-02 16:44:32.029983
Title: WARP: Guaranteed Inner-Layer Repair of NLP Transformers
Title（参考訳）: WARP: NLPトランスのインナー層修復を保証
Authors: Hsin-Ling Hsu, Min-Yu Chen, Nai-Chia Chen, Yan-Ru Chen, Yi-Ling Chang, Fang Yu,
Abstract要約: 本稿では, Transformer モデルの最後のレイヤを超えて修復を拡張可能な制約ベースの修復フレームワークである WARP を提案する。 WARPは、対数ギャップの1次線形化から導かれる凸二次プログラムとして修復を定式化する。 WARP は軽度の仮定の下で全ての補修制約を満たす解に収束することを示す。
参考スコア（独自算出の注目度）: 4.191577542171072
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformer-based NLP models remain vulnerable to adversarial perturbations, yet existing repair methods face a fundamental trade-off: gradient-based approaches offer flexibility but lack verifiability and often overfit; methods that do provide repair guarantees are restricted to the final layer or small networks, significantly limiting the parameter search space available for repair. We present WARP (Weight-Adjusted Repair with Provability), a constraint-based repair framework that extends repair beyond the last layer of Transformer models. WARP formulates repair as a convex quadratic program derived from a first-order linearization of the logit gap, enabling tractable optimization over a high-dimensional parameter space. Under the condition that the first-order approximation holds, this formulation induces three per-sample guarantees: (i) a positive margin constraint ensuring correct classification on repaired inputs, (ii) preservation constraints over a designated remain set, and (iii) a certified robustness radius derived from Lipschitz continuity. To ensure feasibility across varying model architectures, we introduce a sensitivity-based preprocessing step that conditions the optimization landscape accordingly. We further show that the iterative optimization procedure converges to solutions satisfying all repair constraints under mild assumptions. Empirical evaluation on encoder-only Transformers with varying layer architectures validates that these guarantees hold in practice while improving robustness to adversarial inputs. Our results demonstrate that guaranteed, generalizable Transformer repair is achievable through principled constraint-based optimization.
Abstract（参考訳）: トランスフォーマーベースのNLPモデルは、敵の摂動に弱いままであるが、既存の修復手法は根本的なトレードオフに直面している: 勾配ベースのアプローチは柔軟性を提供するが、検証可能性がなく、しばしば過度に適合する; 修復保証を提供する手法は最終層または小さなネットワークに制限され、修理に利用可能なパラメータ検索スペースが大幅に制限される。本稿では,Transformer モデルの最後のレイヤを超えて修復を拡張可能な制約ベースの修復フレームワークである WARP (Weight-Adjusted repair with Provability) を提案する。 WARPは、対数ギャップの1次線形化から導かれる凸二次プログラムとして修復を定式化し、高次元パラメータ空間上でのトラクタブルな最適化を可能にする。一階近似が成り立つ条件の下で、この定式化は3つのサンプル単位の保証を誘導する。一修理された入力の正の分類を確保するための正の利得制約二指定されているものに対する保存上の制約が設定され、 (iii)リプシッツ連続性に由来する証明されたロバスト性半径。様々なモデルアーキテクチャにおける実現可能性を確保するため、我々は、最適化ランドスケープを適切に設定する感度に基づく事前処理手順を導入する。さらに、反復最適化手順は、軽度の仮定の下で全ての修理制約を満たす解に収束することを示す。異なる層構造を持つエンコーダのみのトランスフォーマーの実証評価は、これらの保証が実際に保持され、対向入力に対する堅牢性を改善していることを示す。この結果から, 保証された一般化可能なトランスフォーマー修復は, 原理的制約に基づく最適化によって実現可能であることを示す。

論文の概要: WARP: Guaranteed Inner-Layer Repair of NLP Transformers

関連論文リスト