Fugu-MT 論文翻訳(概要): PowLU: An Activation Function for Stable Pre-Training of LLMs

論文の概要: PowLU: An Activation Function for Stable Pre-Training of LLMs

arxiv url: http://arxiv.org/abs/2605.25704v1
Date: Mon, 25 May 2026 11:02:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:19.824004
Title: PowLU: An Activation Function for Stable Pre-Training of LLMs
Title（参考訳）: PowLU: 安定なLCM前処理のための活性化機能
Authors: Peijie Jiang, Yuqi Feng, Cunyin Peng, Qian Zhao, Jia Liu, KunLong Chen, Zhiqiang Zhang, Jun Zhou,
Abstract要約: 大規模LLM事前学習のための安定した活性化機能である電力線形ユニット(PowLU)を提案する。具体的には、PowLUは適応非線形性を達成するために合理的なパワー関数を使用し、表現能力を改善し、スパイク領域での安定したトレーニングを可能にする。
参考スコア（独自算出の注目度）: 20.337153469330566
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In contemporary large language models (LLMs), the swish-gated linear unit (SwiGLU) activation function is widely adopted to regulate the information flow and introduce non-linearity. For large positive inputs, SwiGLU approximates the quadratic function $x^2$, providing strong nonlinearity and expressive capacity. However, this property also causes numerical instability as the input or model scale increases, particularly in low-precision LLM training. The main reason is its approximate quadratic amplification, which enlarges the output range and exacerbates outliers. To address this issue, we propose a stable activation function, Power Linear Unit (PowLU), for large-scale LLM pre-training. Specifically, PowLU employs a rational power function to achieve adaptive nonlinearity, thereby improving representation ability and enabling stable training in spike regions. Moreover, we provide theoretical justification for several key properties of PowLU. Scaling law experiments confirm that the performance is consistent across model sizes, and further experimental results with the Ling architecture (7.9B and 124B total parameters) demonstrate that PowLU achieves competitive results against SwiGLU and SwiGLU-Clip in large-scale training of LLMs. In addition, the experimental results also show that PowLU effectively improves the scalability of the large-scale training of LLMs.
Abstract（参考訳）: 現代の大言語モデル(LLM)では、情報フローを制御し、非線形性を導入するために、スウィッシュゲート線形単位(SwiGLU)アクティベーション関数が広く採用されている。大きな正の入力に対して、SwiGLU は二次函数 $x^2$ を近似し、強い非線形性と表現能力を与える。しかし、この性質は入力やモデルスケールが増大するにつれて数値的な不安定性を引き起こす。主な理由は2次増幅であり、出力範囲を拡大し、アウトリーチを悪化させる。この問題に対処するために,大規模LLM事前学習のための安定活性化機能である電力線形ユニット(PowLU)を提案する。具体的には、PowLUは適応非線形性を達成するために合理的なパワー関数を使用し、表現能力を改善し、スパイク領域での安定したトレーニングを可能にする。さらに、PowLUのいくつかの重要な性質について理論的に正当化する。スケーリング法則実験により,Lingアーキテクチャ (7.9B と 124B の総パラメータ) によるさらなる実験により,PowLU が LLM の大規模トレーニングにおいて SwiGLU と SwiGLU-Clip と競合する結果が得られることが示された。さらに,実験結果から,PowLUはLLMの大規模トレーニングのスケーラビリティを効果的に向上することが示された。

論文の概要: PowLU: An Activation Function for Stable Pre-Training of LLMs

関連論文リスト