Fugu-MT 論文翻訳(概要): Complementary Attention Head Pruning for Efficient Transformers

論文の概要: Complementary Attention Head Pruning for Efficient Transformers

arxiv url: http://arxiv.org/abs/2606.19150v1
Date: Wed, 17 Jun 2026 14:56:27 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-18 17:16:51.218766
Title: Complementary Attention Head Pruning for Efficient Transformers
Title（参考訳）: 効率的な変圧器のための補足型アテンションヘッドプルーニング
Authors: Yaniv Livertovsky, Shahar Somin, Gonen Singer,
Abstract要約: 本稿では,頭部選択をグローバルグラフ理論問題として再定義する新しいフレームワークであるCAHPを紹介する。 CAHPグラフベースのクラスタリングと情報理論距離計を組み合わせることで、注目ヘッドの最も多様なサブセットを特定し保存する。構造解析の結果,CAHPは勾配式プルーニング手法の「近さバイアス」を回避し,主に出力に近い層で頭部を保存し,その代わりに機能的に重要なアテンションヘッドをモデル中間層に保持する傾向にあることがわかった。
参考スコア（独自算出の注目度）: 2.2991119948183525
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The remarkable success of Transformer-based models in natural language processing stems from architectural scaling, which leads to a large number of parameters and hinders deployment in resource-constrained environments. While structured pruning offers a pathway to compression, existing state-of-the-art methods often rely on gradient-based importance ranking or stochastic gating, which suffer from instability, structural degeneration, and the need for extensive manual hyperparameter tuning. In this paper, we introduce CAHP (Complementary Attention Head Pruning), a novel post-hoc framework that redefines head selection as a global graph-theoretical problem. Rather than evaluating heads in isolation, CAHP utilizes graph-based clustering combined with information-theoretic distance measures to identify and preserve a topologically diverse subset of complementary attention heads. Without requiring a predefined sparsity level or pruning ratio, the framework automatically determines the number of selected attention heads across layers by identifying a diminishing marginal performance curve, where pruning additional heads leads to a sharp degradation in performance, as determined by the chosen polynomial degree. Extensive evaluations on the SST-5 and MNLI benchmarks, across different Transformer model scales, demonstrate that CAHP consistently outperforms competitive baselines, particularly in high-compression regimes. Furthermore, our structural analysis shows that CAHP avoids the "proximity bias" of gradient-based pruning methods, which tend to preserve heads mainly in layers close to the output, and instead retains a functionally critical set of attention heads in the model's intermediate layers.
Abstract（参考訳）: 自然言語処理におけるTransformerベースのモデルの成功は、アーキテクチャのスケーリングに起因する。構造化プルーニングは圧縮の経路を提供するが、既存の最先端の手法は、不安定性、構造劣化、広範囲な手動ハイパーパラメータチューニングの必要性に悩まされる勾配に基づく重要度ランキングや確率ゲーティングに依存していることが多い。本稿では,グローバルグラフ理論問題として頭部選択を再定義する新しいポストホックフレームワークであるCAHP(Complementary Attention Head Pruning)を紹介する。 CAHPは、単独で頭部を評価するのではなく、グラフベースのクラスタリングと情報理論距離測定を組み合わせて、相補的な注意ヘッドの位相的に多様なサブセットを特定し保存する。フレームワークは、予め定義された間隔レベルやプルーニング比を必要とせず、選択された多項式次数によって決定されるように、追加ヘッドのプルーニングが性能の急激な低下につながる辺縁性能曲線を識別することにより、層間における選択された注目ヘッド数を自動的に決定する。 SST-5 と MNLI ベンチマークの広範囲な評価は、トランスフォーマーモデルスケールで、CAHP が競争ベースライン、特に高圧状態において一貫して上回っていることを示している。さらに, この構造解析により, CAHPは, 主に出力近傍の層に頭部を保持する傾向にあり, モデル中間層に機能的に重要な注意点の集合を保持する勾配式プルーニング手法の「近さバイアス」を回避していることが示された。

論文の概要: Complementary Attention Head Pruning for Efficient Transformers

関連論文リスト