Fugu-MT 論文翻訳(概要): Deep Network Approximation: Beyond ReLU to Diverse Activation Functions

論文の概要: Deep Network Approximation: Beyond ReLU to Diverse Activation Functions

arxiv url: http://arxiv.org/abs/2307.06555v5
Date: Wed, 31 Jan 2024 17:57:17 GMT
ステータス: 翻訳完了
システム内更新日: 2024-02-01 18:11:51.621923
Title: Deep Network Approximation: Beyond ReLU to Diverse Activation Functions
Title（参考訳）: Deep Network Approximation: ReLUを超えて、さまざまなアクティベーション関数
Authors: Shijun Zhang, Jianfeng Lu, Hongkai Zhao
Abstract要約: 本稿では,多様な活性化関数に対するディープニューラルネットワークの表現力について検討する。アクティベーション関数セット$mathscrA$は、一般的に使用されるアクティベーション関数の大部分を含むように定義される。
参考スコア（独自算出の注目度）: 12.479831561907007
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper explores the expressive power of deep neural networks for a diverse range of activation functions. An activation function set $\mathscr{A}$ is defined to encompass the majority of commonly used activation functions, such as $\mathtt{ReLU}$, $\mathtt{LeakyReLU}$, $\mathtt{ReLU}^2$, $\mathtt{ELU}$, $\mathtt{CELU}$, $\mathtt{SELU}$, $\mathtt{Softplus}$, $\mathtt{GELU}$, $\mathtt{SiLU}$, $\mathtt{Swish}$, $\mathtt{Mish}$, $\mathtt{Sigmoid}$, $\mathtt{Tanh}$, $\mathtt{Arctan}$, $\mathtt{Softsign}$, $\mathtt{dSiLU}$, and $\mathtt{SRS}$. We demonstrate that for any activation function $\varrho\in \mathscr{A}$, a $\mathtt{ReLU}$ network of width $N$ and depth $L$ can be approximated to arbitrary precision by a $\varrho$-activated network of width $3N$ and depth $2L$ on any bounded set. This finding enables the extension of most approximation results achieved with $\mathtt{ReLU}$ networks to a wide variety of other activation functions, albeit with slightly increased constants. Significantly, we establish that the (width,$\,$depth) scaling factors can be further reduced from $(3,2)$ to $(1,1)$ if $\varrho$ falls within a specific subset of $\mathscr{A}$. This subset includes activation functions such as $\mathtt{ELU}$, $\mathtt{CELU}$, $\mathtt{SELU}$, $\mathtt{Softplus}$, $\mathtt{GELU}$, $\mathtt{SiLU}$, $\mathtt{Swish}$, and $\mathtt{Mish}$.
Abstract（参考訳）: 本稿では,多様な活性化関数に対するディープニューラルネットワークの表現力について検討する。 $\mathtt{ReLU}$, $\mathtt{LeakyReLU}$, $\matht{ReLU}^2$, $\matht{ELU}$, $\matht{CELU}$, $\matht{SELU}$, $\matht{Softplus}$, $\matht{GELU}$, $\matht{SiLU}$, $\matht{Swish}$, $\matht{Mish$, $\matht{Swish}$, $\matht{Swish}$, $\matht{Swish}$, $, $\mathtt{ELU}$, $\mathtt{Swt}$, $, $\mathtt{Swish}$, $, $\mathttt{Swish}$, $, $\mathttt{SELU}$, $, $\mathttt{CELU}$, $, $\mathttt{SELU}$, $, $\mathtt{SELU}$, $, $\mathttt{SELU}$, $\mathttt{SELU}$, $, $\mathttt{Swt{Swt}$, $, $, $\mathttt{Sw, $\matht{Swt}$, $, $\mathtt , $\matht, $\matht , $\matht, $\matht{SELU}$\matht{S}$}$, $\matht{SELU}$, $\matht{SELU}$, $\matht{S}$, $, $\matht{S, $\matht, $, $\matht , $, $, $} 任意の活性化関数 $\varrho\in \mathscr{A}$, a $\mathtt{ReLU}$ network of width $N$ and depth $L$ に対して、任意の有界集合上の$\varrho$-activated network of width $3N$ and depth $2L$ を任意の精度で近似できることを示した。この発見により、$\mathtt{ReLU}$ネットワークで達成されるほとんどの近似結果を、定数がわずかに増加するにもかかわらず、様々な活性化関数に拡張することができる。重要なことに、 (width,$\,$depth) スケーリング因子が $(3,2)$ から $(1,1)$ にさらに還元できることは、$\varrho$ が $\mathscr{A}$ の特定の部分集合に該当することを保証する。このサブセットには、$\matht{ELU}$, $\matht{CELU}$, $\matht{SELU}$, $\matht{Softplus}$, $\matht{GELU}$, $\matht{SiLU}$, $\matht{Swish}$, $\matht{Mish}$などのアクティベーション関数が含まれる。

関連論文リスト

Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs [54.28273395444243]
我々は,モノトニック値 Omega (MVP) アルゴリズムが,差分を考慮した差分依存残差境界を$tildeOleft(left(sum_Delta_h(s,a)>0 fracH2 log K land MathttVar_maxtextc$。
論文参考訳（メタデータ） (2025-06-06T20:33:57Z)
On the Complexity of Pure-State Consistency of Local Density Matrices [0.0]
局所密度行列(mathsfPureCLDM$)および純$N$-representability(mathsfPure$-$N$-$mathsfRepresentability$)問題の純粋整合性について検討する。この新しいクラスには$mathsfPure$-$N$-$mathsfRepresentability$と$mathsfPureCLDM$の両方が完了していることを証明します。
論文参考訳（メタデータ） (2024-11-05T13:43:21Z)
The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
この問題は通信複雑性のランダム化を$Omega(frac1kcdot n2log|mathbbF|)$とする。アプリケーションとして、$k$パスを持つ任意のストリーミングアルゴリズムに対して、$Omega(frac1kcdot n2log|mathbbF|)$スペースローバウンドを得る。
論文参考訳（メタデータ） (2024-10-26T06:21:42Z)
Locality Regularized Reconstruction: Structured Sparsity and Delaunay Triangulations [7.148312060227714]
線形表現学習は、その概念的単純さと、圧縮、分類、特徴抽出といったタスクにおける経験的有用性から、広く研究されている。本研究では、正則化最小二乗回帰問題を解くことにより、$mathbfy$の局所再構成を形成する$mathbfw$を求める。すべてのレベルの正則化と、$mathbfX$ の列が独自のデラウネー三角形を持つという穏やかな条件の下では、最適係数の非零成分の数は$d+1$ で上界となることを証明している。
論文参考訳（メタデータ） (2024-05-01T19:56:52Z)
Provably learning a multi-head attention layer [55.2904547651831]
マルチヘッドアテンション層は、従来のフィードフォワードモデルとは分離したトランスフォーマーアーキテクチャの重要な構成要素の1つである。本研究では,ランダムな例から多面的注意層を実証的に学習する研究を開始する。最悪の場合、$m$に対する指数的依存は避けられないことを示す。
論文参考訳（メタデータ） (2024-02-06T15:39:09Z)
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time [7.613259578185218]
我々は、一層注意ネットワーク目的関数 $L(X,Y) の証明可能な保証を提供することに注力する。多層LCMネットワークでは、mathbbRn×d2$の行列$Bを層の出力と見なすことができる。損失関数をトレーニングする反復アルゴリズムを$L(X,Y)$ up $epsilon$で、$widetildeO( (cal T_mathrmmat(n,d) + dで実行される。
論文参考訳（メタデータ） (2023-09-14T04:23:40Z)
Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization [54.29685789885059]
本稿では, 2次行列分解(BMF)問題に対する効率的な$(1+varepsilon)$-approximationアルゴリズムを提案する。目標は、低ランク因子の積として$mathbfA$を近似することである。我々の手法はBMF問題の他の一般的な変種に一般化する。
論文参考訳（メタデータ） (2023-06-02T18:55:27Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
モノトン活性化に対する $mathbfxmapstosigma(mathbfwcdotmathbfx)$ の関数について検討する。学習者の目標は仮説ベクトル $mathbfw$ that $F(mathbbw)=C, epsilon$ を高い確率で出力することである。
論文参考訳（メタデータ） (2022-06-17T17:55:43Z)
Linear Bandits on Uniformly Convex Sets [88.3673525964507]
線形バンディットアルゴリズムはコンパクト凸作用集合上の $tildemathcalo(nsqrtt)$ pseudo-regret 境界を与える。 2種類の構造的仮定は、より良い擬似回帰境界をもたらす。
論文参考訳（メタデータ） (2021-03-10T07:33:03Z)
Deep Neural Networks with ReLU-Sine-Exponential Activations Break Curse of Dimensionality on H\"older Class [6.476766717110237]
活性化関数としてReLU,sine,2x$のニューラルネットワークを構築した。スーパー表現力に加えて、ReLU-sine-$2x$ネットワークで実装された関数は(一般化)微分可能である。
論文参考訳（メタデータ） (2021-02-28T15:57:42Z)
Phase Transitions in Rate Distortion Theory and Deep Learning [5.145741425164946]
もし$mathcalS$をエンコードするために$mathcalO(R-s)$のエラーを達成できれば、$mathcalS$は$s$で圧縮できると言う。ある"ニッチ"信号クラスに対して、$mathcalS$が相転移を起こすことを示す。
論文参考訳（メタデータ） (2020-08-03T16:48:49Z)
Deep Network with Approximation Error Being Reciprocal of Width to Power of Square Root of Depth [4.468952886990851]
超近似パワーを持つ新しいネットワークが導入された。このネットワークは、各ニューロン内のFloor(lfloor xrfloor$)またはReLU(max0,x$)アクティベーション関数で構築されている。
論文参考訳（メタデータ） (2020-06-22T13:27:33Z)
$Q$-learning with Logarithmic Regret [60.24952657636464]
楽観的な$Q$は$mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of state, $A$ is the number of action, $H$ is the planning horizon, $T$ is the total number of steps, $Delta_min$ is the least sub-Optitimality gap。
論文参考訳（メタデータ） (2020-06-16T13:01:33Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。