Fugu-MT 論文翻訳(概要): Universal Properties of Activation Sparsity in Modern Large Language Models

論文の概要: Universal Properties of Activation Sparsity in Modern Large Language Models

arxiv url: http://arxiv.org/abs/2509.00454v1
Date: Sat, 30 Aug 2025 10:47:21 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-04 15:17:03.239553
Title: Universal Properties of Activation Sparsity in Modern Large Language Models
Title（参考訳）: 現代大言語モデルにおける活性化空間の普遍的性質
Authors: Filip Szatkowski, Patryk Będkowski, Alessio Devoto, Jan Dubiński, Pasquale Minervini, Mikołaj Piórczyński, Simone Scardapane, Bartosz Wójcik,
Abstract要約: 本稿では, 近代LLMのFFN層におけるスポーサリティの堅牢性を評価するための枠組みと, その現象の系統的研究について述べる。本研究は, LLMにおけるアクティベーション空間の普遍的パターンを明らかにし, この現象の知見を提供し, モデル設計と加速に活用するための実践的ガイドラインを提供する。
参考スコア（独自算出の注目度）: 20.84931970096774
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Input-dependent activation sparsity is a notable property of deep learning models, which has been extensively studied in networks with ReLU activations and is associated with efficiency, robustness, and interpretability. However, the approaches developed for ReLU-based models depend on exact zero activations and do not transfer directly to modern large language models~(LLMs), which have abandoned ReLU in favor of other activation functions. As a result, current work on activation sparsity in LLMs is fragmented, model-specific, and lacks consensus on which components to target. We propose a general framework to assess sparsity robustness and present a systematic study of the phenomenon in the FFN layers of modern LLMs, including diffusion LLMs. Our findings reveal universal patterns of activation sparsity in LLMs, provide insights into this phenomenon, and offer practical guidelines for exploiting it in model design and acceleration.
Abstract（参考訳）: 入力依存型アクティベーション空間は、ReLUアクティベーションを持つネットワークで広く研究され、効率、堅牢性、解釈可能性に関連するディープラーニングモデルの顕著な特性である。しかし、ReLUベースのモデルのために開発されたアプローチは、正確なゼロアクティベーションに依存し、ReLUを放棄した現代の大規模言語モデル~(LLM)に直接移行しない。結果として、LLMにおけるアクティベーションの空間性に関する現在の研究は断片化され、モデル固有のものであり、どのコンポーネントを対象とするかのコンセンサスが欠如している。本稿では, 拡散LDMを含む近代LLMのFFN層におけるスポーサリティの堅牢性を評価するための一般的な枠組みを提案する。本研究は, LLMにおけるアクティベーション空間の普遍的パターンを明らかにし, この現象の知見を提供し, モデル設計と加速に活用するための実践的ガイドラインを提供する。

論文の概要: Universal Properties of Activation Sparsity in Modern Large Language Models

関連論文リスト