Fugu-MT 論文翻訳(概要): On the Sample Complexity of Two-Layer Networks: Lipschitz vs. Element-Wise Lipschitz Activation

論文の概要: On the Sample Complexity of Two-Layer Networks: Lipschitz vs. Element-Wise Lipschitz Activation

arxiv url: http://arxiv.org/abs/2211.09634v4
Date: Sat, 20 Jan 2024 22:04:21 GMT
ステータス: 翻訳完了
システム内更新日: 2024-01-23 22:27:23.971475
Title: On the Sample Complexity of Two-Layer Networks: Lipschitz vs. Element-Wise Lipschitz Activation
Title（参考訳）: 2層ネットワークのサンプル複雑性について:Lipschitz vs. Element-Wise Lipschitz Activation
Authors: Amit Daniely and Elad Granot
Abstract要約: 本研究では,異なるアクティベーション関数を用いた有界二層ニューラルネットワークのサンプル複雑性について検討する。我々は、$sigma$ が要素ワイドであれば、$mathcalH$ のサンプルの複雑さは、幅の対数依存しか持たないことを証明する。
参考スコア（独自算出の注目度）: 20.70453775428433
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We investigate the sample complexity of bounded two-layer neural networks using different activation functions. In particular, we consider the class $$ \mathcal{H} = \left\{\textbf{x}\mapsto \langle \textbf{v}, \sigma \circ W\textbf{b} + \textbf{b} \rangle : \textbf{b}\in\mathbb{R}^d, W \in \mathbb{R}^{\mathcal{T}\times d}, \textbf{v} \in \mathbb{R}^{\mathcal{T}}\right\} $$ where the spectral norm of $W$ and $\textbf{v}$ is bounded by $O(1)$, the Frobenius norm of $W$ is bounded from its initialization by $R > 0$, and $\sigma$ is a Lipschitz activation function. We prove that if $\sigma$ is element-wise, then the sample complexity of $\mathcal{H}$ has only logarithmic dependency in width and that this complexity is tight, up to logarithmic factors. We further show that the element-wise property of $\sigma$ is essential for a logarithmic dependency bound in width, in the sense that there exist non-element-wise activation functions whose sample complexity is linear in width, for widths that can be up to exponential in the input dimension. For the upper bound, we use the recent approach for norm-based bounds named Approximate Description Length (ADL) by arXiv:1910.05697. We further develop new techniques and tools for this approach that will hopefully inspire future works.
Abstract（参考訳）: 異なる活性化関数を用いた有界二層ニューラルネットワークのサンプル複雑性について検討する。特に、クラス $$$ \mathcal{H} = \left\{\textbf{x}\mapsto \langle \textbf{v}, \sigma \circ W\textbf{b} + \textbf{b} \rangle : \textbf{b}\in\mathbb{R}^d, W \in \mathbb{R}^{\mathcal{T}\times d}, \textbf{v} \in \mathbb{R}^{\mathcal{T}}\right\} $$$$$W$と$\textbf{v}$のノルムが$O(1)$で束縛され、$W$W$のフロベニウスノルムはその初期化から$R>$0で束縛される。すると、$\sigma$ が要素単位であるなら、$\mathcal{h}$ のサンプル複雑性は対数依存性のみを持ち、この複雑性は対数因子まで密接であることが証明される。さらに、入力次元において指数的となる幅に対して、サンプルの複雑さが線形な非要素的活性化関数が存在するという意味で、$\sigma$の要素ワイド性は、幅に有界な対数依存に必須であることを示す。上界に対しては、arXiv:1910.05697 により Approximate Description Length (ADL) と呼ばれるノルムベース境界に対する最近のアプローチを用いる。このアプローチのための新しい技術やツールがさらに開発され、将来的な成果が期待できる。

関連論文リスト

The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
この問題は通信複雑性のランダム化を$Omega(frac1kcdot n2log|mathbbF|)$とする。アプリケーションとして、$k$パスを持つ任意のストリーミングアルゴリズムに対して、$Omega(frac1kcdot n2log|mathbbF|)$スペースローバウンドを得る。
論文参考訳（メタデータ） (2024-10-26T06:21:42Z)
Monge-Kantorovich Fitting With Sobolev Budgets [6.748324975906262]
我々は、$rho$が$mtext-d$集合の近くに集中しているとき、これをノイズのあるデータを持つ多様体学習問題と解釈できることを示した。 Monge-Kantorovich $p$-cost $mathbbW_pp(rho, nu)$を介して$rho$を近似する際の$nu$のパフォーマンスを定量化し、$mathrmsupp nu$を$f : mathbbRmでカバーできるようにすることで複雑さを制限します。
論文参考訳（メタデータ） (2024-09-25T01:30:16Z)
Iterative thresholding for non-linear learning in the strong $\varepsilon$-contamination model [3.309767076331365]
閾値降下を用いた単一ニューロンモデル学習のための近似境界を導出する。線形回帰問題も研究し、$sigma(mathbfx) = mathbfx$ となる。
論文参考訳（メタデータ） (2024-09-05T16:59:56Z)
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
単一インデックス対象関数 $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ の等方的ガウスデータの下で勾配降下学習の問題を考察する。 SGDアルゴリズムで最適化された2層ニューラルネットワークは、サンプル付き任意のリンク関数の$f_*$を学習し、実行時の複雑さは$n asymp T asymp C(q) cdot dであることを示す。
論文参考訳（メタデータ） (2024-06-03T17:56:58Z)
Provably learning a multi-head attention layer [55.2904547651831]
マルチヘッドアテンション層は、従来のフィードフォワードモデルとは分離したトランスフォーマーアーキテクチャの重要な構成要素の1つである。本研究では,ランダムな例から多面的注意層を実証的に学習する研究を開始する。最悪の場合、$m$に対する指数的依存は避けられないことを示す。
論文参考訳（メタデータ） (2024-02-06T15:39:09Z)
Optimal Estimator for Linear Regression with Shuffled Labels [17.99906229036223]
本稿では,シャッフルラベルを用いた線形回帰の課題について考察する。 mathbb Rntimes m の $mathbf Y、mathbb Rntimes p の mathbf Pi、mathbb Rptimes m$ の mathbf B、mathbb Rntimes m$ の $mathbf Win mathbb Rntimes m$ である。
論文参考訳（メタデータ） (2023-10-02T16:44:47Z)
A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing [68.80803866919123]
非線形測定では、ほとんどの先行結果は一様ではない、すなわち、すべての$mathbfx*$に対してではなく、固定された$mathbfx*$に対して高い確率で保持される。本フレームワークはGCSに1ビット/一様量子化観測と単一インデックスモデルを標準例として適用する。また、指標集合が計量エントロピーが低い製品プロセスに対して、より厳密な境界を生み出す濃度不等式も開発する。
論文参考訳（メタデータ） (2023-09-25T17:54:19Z)
An Over-parameterized Exponential Regression [18.57735939471469]
LLM(Large Language Models)の分野での最近の発展は、指数的アクティベーション関数の使用への関心を喚起している。ニューラル関数 $F: mathbbRd times m times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRdd
論文参考訳（メタデータ） (2023-03-29T07:29:07Z)
Quantum and classical low-degree learning via a dimension-free Remez inequality [52.12931955662553]
ハイパーグリッド上の関数をポリトーラス上の高調波拡張に関連付ける新しい方法を示す。巡回群 $exp(2pi i k/K)_k=1K$ の積に対して函数の上限が$f$であることを示す。我々は最近、超キューブやキュービット上の観測可能な観測値の低次学習を、同様に効率的に行う方法として、EI22, CHP, VZ22を引用して、新しい空間に拡張した。
論文参考訳（メタデータ） (2023-01-04T04:15:40Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
モノトン活性化に対する $mathbfxmapstosigma(mathbfwcdotmathbfx)$ の関数について検討する。学習者の目標は仮説ベクトル $mathbfw$ that $F(mathbbw)=C, epsilon$ を高い確率で出力することである。
論文参考訳（メタデータ） (2022-06-17T17:55:43Z)
Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
期待される正方形損失から、最も適合した単一ニューロンを学習することの問題点を考察する。 ReLUアクティベーションでは、我々の人口リスク保証は$O(mathsfOPT1/2)+epsilon$である。 ReLUアクティベーションでは、我々の人口リスク保証は$O(mathsfOPT1/2)+epsilon$である。
論文参考訳（メタデータ） (2020-05-29T07:20:35Z)
On the Complexity of Minimizing Convex Finite Sums Without Using the Indices of the Individual Functions [62.01594253618911]
有限和の有限ノイズ構造を利用して、大域オラクルモデルの下での一致する$O(n2)$-upper境界を導出する。同様のアプローチを踏襲したSVRGの新規な適応法を提案し、これはオラクルと互換性があり、$tildeO(n2+nsqrtL/mu)log (1/epsilon)$と$O(nsqrtL/epsilon)$, for $mu>0$と$mu=0$の複雑さ境界を実現する。
論文参考訳（メタデータ） (2020-02-09T03:39:46Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。