Fugu-MT 論文翻訳(概要): LEAP: Learnable End-to-End Adaptive Pruning of Large Language Models

論文の概要: LEAP: Learnable End-to-End Adaptive Pruning of Large Language Models

arxiv url: http://arxiv.org/abs/2605.17289v1
Date: Sun, 17 May 2026 07:01:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:47.832887
Title: LEAP: Learnable End-to-End Adaptive Pruning of Large Language Models
Title（参考訳）: LEAP: 大規模言語モデルの学習可能なエンド・ツー・エンド適応プルーニング
Authors: Mohammad Mozaffari, Younes Hourri, Mohammad Rastegari, Mahyar Najibi,
Abstract要約: 未構造化プルーニングの最先端手法は、最適脳サージオン原理から導かれる階層的なサロゲートである。本稿では,この難易度パラメータ化をBernoulli-via-Gumbelシグモイド緩和法に置き換えるLEAPを提案する。 0.5Bから8Bパラメータの50%と60%の間隔で5つのLLMファミリーにまたがって、LEAPは6タスク平均ゼロショット精度をADMM平均で2.59ポイント改善する。
参考スコア（独自算出の注目度）: 19.274512633962086
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Unstructured sparsity is now natively accelerated by recent GPU kernels and dataflow hardware, shifting the bottleneck from inference execution to the pruning algorithm. State-of-the-art methods for unstructured LLM pruning are layer-wise surrogates derived from the Optimal Brain Surgeon principle, and they sacrifice end-to-end accuracy, especially under aggressive sparsity. End-to-end alternatives such as MaskLLM and PATCH show that learnable masks can close this gap, but their categorical-over-patterns parameterization scales with the number of valid masks per row and does not port to the unstructured setting. We introduce LEAP, which replaces this intractable parameterization with a per-weight Bernoulli-via-Gumbel- sigmoid relaxation that makes end-to-end unstructured mask learning tractable. Across five LLM families from 0.5B to 8B parameters at 50% and 60% sparsity, LEAP improves six-task average zero-shot accuracy by +2.59 points on average over ADMM, the best layer-wise baseline in our sweep.
Abstract（参考訳）: 最近のGPUカーネルとデータフローハードウェアによって、非構造化のスパーシリティがネイティブに高速化され、ボトルネックが推論実行からプルーニングアルゴリズムにシフトした。構造化されていないLLM刈りの最先端の手法は、最適脳サージオン原理から導かれるレイヤーワイドサロゲートであり、特に攻撃的な間隔で、エンドツーエンドの精度を犠牲にしている。 MaskLLMやPATCHのようなエンドツーエンドの代替手法では、学習可能なマスクがこのギャップを埋めることができるが、それらのカテゴリ・オーバー・パターンのパラメータ化は行ごとの有効なマスクの数とスケールし、構造化されていない設定に移植されない。本稿では,この難易度パラメータ化を,終端非構造化マスク学習をトラクタブルにするBernoulli-via-Gumbel-シグモイド緩和法に置き換えるLEAPを提案する。 0.5Bから8Bパラメータの50%と60%の間隔で5つのLLMファミリーでLEAPは6タスク平均ゼロショット精度をADMM以上の平均2.59ポイント改善する。

論文の概要: LEAP: Learnable End-to-End Adaptive Pruning of Large Language Models

関連論文リスト