Fugu-MT 論文翻訳(概要): Lightweight error mitigation strategies for post-training N:M activation sparsity in LLMs

論文の概要: Lightweight error mitigation strategies for post-training N:M activation sparsity in LLMs

arxiv url: http://arxiv.org/abs/2509.22166v1
Date: Fri, 26 Sep 2025 10:27:55 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-29 20:57:54.366622
Title: Lightweight error mitigation strategies for post-training N:M activation sparsity in LLMs
Title（参考訳）: LLMにおける後処理N:Mアクティベーションの軽量化戦略
Authors: Shirin Alanova, Kristina Kazistova, Ekaterina Galaeva, Alina Kostromina, Vladimir Smirnov, Redko Dmitry, Alexey Dontsov, Maxim Zhelnin, Evgeny Burnaev, Egor Shvetsov,
Abstract要約: 本研究は,大規模言語モデルにおけるN:Mアクティベーションプルーニングの学習後手法を包括的に分析する。本研究は, 刈り込み活性化により, 同等の空間レベルでの刈り込みに比べて, 生成能の保存性が向上できることを実証する。本研究は,アクティベーションプルーニングの効果的な実践方法と,よりフレキシブルなスパーシティパターンをサポートする将来のハードウェアへのモチベーションを提供する。
参考スコア（独自算出の注目度）: 17.379374639721554
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The demand for efficient large language model (LLM) inference has intensified the focus on sparsification techniques. While semi-structured (N:M) pruning is well-established for weights, its application to activation pruning remains underexplored despite its potential for dynamic, input-adaptive compression and reductions in I/O overhead. This work presents a comprehensive analysis of methods for post-training N:M activation pruning in LLMs. Across multiple LLMs, we demonstrate that pruning activations enables superior preservation of generative capabilities compared to weight pruning at equivalent sparsity levels. We evaluate lightweight, plug-and-play error mitigation techniques and pruning criteria, establishing strong hardware-friendly baselines that require minimal calibration. Furthermore, we explore sparsity patterns beyond NVIDIA's standard 2:4, showing that the 16:32 pattern achieves performance nearly on par with unstructured sparsity. However, considering the trade-off between flexibility and hardware implementation complexity, we focus on the 8:16 pattern as a superior candidate. Our findings provide both effective practical methods for activation pruning and a motivation for future hardware to support more flexible sparsity patterns. Our code is available https://anonymous.4open.science/r/Structured-Sparse-Activations-Inference-EC3C/README.md .
Abstract（参考訳）: 効率的な大言語モデル(LLM)推論の需要は、スパーシフィケーション技術に重点を置いている。セミ構造化(N:M)プルーニングは重みに対して十分に確立されているが、動的で入力適応的な圧縮とI/Oオーバーヘッドの低減の可能性にもかかわらず、アクティベーションプルーニングへの応用は未探索のままである。本研究は,LLMにおけるN:Mアクティベーションプルーニングの学習後の手法を包括的に分析する。複数のLDMをまたいで, 刈り込み活性化により, 同等の空間レベルでの刈り込みに比べて, 再生能力の保存性が向上することが実証された。我々は、軽量でプラグアンドプレイのエラー軽減技術とプルーニング基準を評価し、最小限の校正を必要とするハードウェアフレンドリーなベースラインを確立する。さらに、NVIDIAの標準2:4を超えるスパーシリティパターンを調査し、16:32パターンが非構造化のスパーシリティとほぼ同等のパフォーマンスを達成することを示す。しかし、柔軟性とハードウェア実装の複雑さのトレードオフを考えると、優れた候補として8:16パターンに注目します。本研究は,アクティベーションプルーニングの効果的な実践方法と,よりフレキシブルなスパーシティパターンをサポートする将来のハードウェアへのモチベーションを提供する。私たちのコードはhttps://anonymous.4open.science/r/Structured-Sparse-Activations-Inference-EC3C/README.mdで利用可能です。

論文の概要: Lightweight error mitigation strategies for post-training N:M activation sparsity in LLMs

関連論文リスト