Fugu-MT 論文翻訳(概要): Greedy-Gnorm: A Gradient Matrix Norm-Based Alternative to Attention Entropy for Head Pruning

論文の概要: Greedy-Gnorm: A Gradient Matrix Norm-Based Alternative to Attention Entropy for Head Pruning

arxiv url: http://arxiv.org/abs/2602.04491v1
Date: Wed, 04 Feb 2026 12:28:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-05 19:45:11.521587
Title: Greedy-Gnorm: A Gradient Matrix Norm-Based Alternative to Attention Entropy for Head Pruning
Title（参考訳）: Greedy-Gnorm: ヘッドプルーニングにおける注意エントロピーに代わるグラディエントマトリックスノルム
Authors: Yuxi Guo, Paul Sheridan,
Abstract要約: グレディ・グラディエント・ノルム(グレディ・グラディエント・ノルム、Greedy-Gradient norm、Greedy-Gnorm、Greedy-Gnorm、Greedy-Gnorm、Greedy-Gnorm)は、ヘッド・プルーニング・アルゴリズムである。 BERT, ALBERT, RoBERTa, XLM-RoBERTaの実験は、グリーディ・グノームが実質的な頭部除去の下で常に精度を保っていることを示した。
参考スコア（独自算出の注目度）: 1.0742675209112622
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Attention head pruning has emerged as an effective technique for transformer model compression, an increasingly important goal in the era of Green AI. However, existing pruning methods often rely on static importance scores, which fail to capture the evolving role of attention heads during iterative removal. We propose Greedy-Gradient norm (Greedy-Gnorm), a novel head pruning algorithm that dynamically recalculates head importance after each pruning step. Specifically, each head is scored by the elementwise product of the l2-norms of its Q/K/V gradient blocks, as estimated from a hold-out validation set and updated at every greedy iteration. This dynamic approach to scoring mitigates against stale rankings and better reflects gradient-informed importance as pruning progresses. Extensive experiments on BERT, ALBERT, RoBERTa, and XLM-RoBERTa demonstrate that Greedy-Gnorm consistently preserves accuracy under substantial head removal, outperforming attention entropy. By effectively reducing model size while maintaining task performance, Greedy-Gnorm offers a promising step toward more energy-efficient transformer model deployment.
Abstract（参考訳）: インテンションヘッドプルーニングは、グリーンAI時代においてますます重要な目標であるトランスフォーマーモデル圧縮の有効なテクニックとして登場した。しかし、既存のプルーニング法は、しばしば静的な重要度に頼っているため、反復的除去時の注目ヘッドの役割の進化を捉えていない。 Greedy-Gradient norm(Greedy-Gnorm)を提案する。具体的には、各ヘッドは、そのQ/K/V勾配ブロックのl2-ノルムの要素積によってスコアされ、ホールドアウト検証セットから推定され、各グレディ反復で更新される。古いランクに対するミティゲートの評価に対するこのダイナミックなアプローチは、プルーニングの進行に伴って勾配インフォームドの重要性を反映する。 BERT, ALBERT, RoBERTa, XLM-RoBERTaの広範囲にわたる実験により, グリーディ・グノームは相当な頭部除去の下で常に精度を保ち, 注意エントロピーよりも優れていた。タスクパフォーマンスを維持しながらモデルサイズを効果的に削減することで、Greedy-Gnormはよりエネルギー効率の良いトランスフォーマーモデル展開に向けた有望なステップを提供する。

論文の概要: Greedy-Gnorm: A Gradient Matrix Norm-Based Alternative to Attention Entropy for Head Pruning

関連論文リスト