Fugu-MT 論文翻訳(概要): IGLU: The Integrated Gaussian Linear Unit Activation Function

論文の概要: IGLU: The Integrated Gaussian Linear Unit Activation Function

arxiv url: http://arxiv.org/abs/2603.06861v1
Date: Fri, 06 Mar 2026 20:28:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:13.211965
Title: IGLU: The Integrated Gaussian Linear Unit Activation Function
Title（参考訳）: IGLU: ガウス線形単位活性化関数
Authors: Mingi Kang, Zai Yang, Jeova Farias Sales Rocha Neto,
Abstract要約: 半正規混合分布の下でGELUゲートのスケール混合として導出されるパラメトリック活性化関数IGLUを導入する。 IGLUは、ReLUとGELUのベースラインに対して、視覚と言語データセットの両方において、競争力または優れた性能を達成することを示す。
参考スコア（独自算出の注目度）: 13.305282275999778
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Activation functions are fundamental to deep neural networks, governing gradient flow, optimization stability, and representational capacity. Within historic deep architectures, while ReLU has been the dominant choice for the activation function, modern transformer-based models increasingly are adopting smoother alternatives such as GELU and other self-gated alternatives. Despite their empirical success, the mathematical relationships among these functions and the principles underlying their effectiveness remains only partially understood. We introduce IGLU, a parametric activation function derived as a scale mixture of GELU gates under a half-normal mixing distribution. This derivation yields a closed-form expression whose gating component is exactly the Cauchy CDF, providing a principled one-parameter family that continuously interpolates between identity-like and ReLU-like behavior via a single sharpness parameter $σ$. Unlike GELU's Gaussian gate, IGLU's heavy-tailed Cauchy gate decays polynomially in the negative tail, guaranteeing non-zero gradients for all finite inputs and offering greater robustness to vanishing gradients. We further introduce IGLU-Approx, a computationally efficient rational approximation of IGLU expressed entirely in terms of ReLU operations that eliminates transcendental function evaluation. Through evaluations on CIFAR-10, CIFAR-100, and WikiText-103 across ResNet-20, ViT-Tiny, and GPT-2 Small, IGLU achieves competitive or superior performance on both vision and language datasets against ReLU and GELU baselines, with IGLU-Approx recovering this performance at substantially reduced computational cost. In particular, we show that employing a heavy-tailed gate leads to considerable performance gains in heavily imbalanced classification datasets.
Abstract（参考訳）: 活性化関数はディープニューラルネットワークの基本であり、勾配流の制御、最適化安定性、表現能力である。歴史的に深いアーキテクチャの中では、ReLUがアクティベーション機能の主要な選択肢であるのに対して、現代のトランスフォーマーベースのモデルはGELUなどのよりスムーズな代替品を採用する傾向にある。彼らの経験的成功にもかかわらず、これらの機能間の数学的関係とそれらの効果の根底にある原理は、まだ部分的にしか理解されていない。半正規混合分布の下でGELUゲートのスケール混合として導出されるパラメトリック活性化関数IGLUを導入する。この導出により、ゲーティング成分がちょうどコーシー CDF である閉形式式が得られ、単一のシャープネスパラメータ$σ$ を通じて恒常的に恒常的に恒常的なIDとReLUのような振舞いを補間する一パラメータ族が提供される。 GELUのガウス門とは異なり、IGLUの重い尾を持つコーシー門は負の尾で多項式的に崩壊し、全ての有限入力に対してゼロでない勾配を保証し、消滅する勾配に対してより堅牢性を与える。さらに、超越関数評価をなくすReLU演算で完全に表現されたIGLUの計算効率の良い有理近似であるIGLU-Approxを導入する。 CIFAR-10, CIFAR-100, WikiText-103をResNet-20, ViT-Tiny, GPT-2で評価することで、IGLUは、ReLUとGELUのベースラインに対するビジョンデータセットと言語データセットの競合や優れたパフォーマンスを実現し、IGLU-Approxはこの性能を計算コストを大幅に削減した。特に、重み付きゲートを用いることで、重みの不均衡な分類データセットにおいて、かなりの性能向上がもたらされることを示す。

論文の概要: IGLU: The Integrated Gaussian Linear Unit Activation Function

関連論文リスト