Fugu-MT 論文翻訳(概要): Learning Hierarchical Polynomials of Multiple Nonlinear Features with Three-Layer Networks

論文の概要: Learning Hierarchical Polynomials of Multiple Nonlinear Features with Three-Layer Networks

arxiv url: http://arxiv.org/abs/2411.17201v1
Date: Tue, 26 Nov 2024 08:14:48 GMT
ステータス: 翻訳完了
システム内更新日: 2024-12-04 17:21:35.083411
Title: Learning Hierarchical Polynomials of Multiple Nonlinear Features with Three-Layer Networks
Title（参考訳）: 3層ネットワークを用いた複数非線形特徴量の階層的多項式学習
Authors: Hengyu Fu, Zihao Wang, Eshaan Nichani, Jason D. Lee,
Abstract要約: ディープラーニング理論では、ニューラルネットワークが階層的特徴をどのように学習するかを理解することが重要な問題である。本研究では,3層ニューラルネットワークを用いたテキストマルチプル非線形特徴の階層的学習について検討する。
参考スコア（独自算出の注目度）: 46.190882811878744
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In deep learning theory, a critical question is to understand how neural networks learn hierarchical features. In this work, we study the learning of hierarchical polynomials of \textit{multiple nonlinear features} using three-layer neural networks. We examine a broad class of functions of the form $f^{\star}=g^{\star}\circ \bp$, where $\bp:\mathbb{R}^{d} \rightarrow \mathbb{R}^{r}$ represents multiple quadratic features with $r \ll d$ and $g^{\star}:\mathbb{R}^{r}\rightarrow \mathbb{R}$ is a polynomial of degree $p$. This can be viewed as a nonlinear generalization of the multi-index model \citep{damian2022neural}, and also an expansion upon previous work that focused only on a single nonlinear feature, i.e. $r = 1$ \citep{nichani2023provable,wang2023learning}. Our primary contribution shows that a three-layer neural network trained via layerwise gradient descent suffices for \begin{itemize}\item complete recovery of the space spanned by the nonlinear features \item efficient learning of the target function $f^{\star}=g^{\star}\circ \bp$ or transfer learning of $f=g\circ \bp$ with a different link function \end{itemize} within $\widetilde{\cO}(d^4)$ samples and polynomial time. For such hierarchical targets, our result substantially improves the sample complexity ${\Theta}(d^{2p})$ of the kernel methods, demonstrating the power of efficient feature learning. It is important to highlight that{ our results leverage novel techniques and thus manage to go beyond all prior settings} such as single-index and multi-index models as well as models depending just on one nonlinear feature, contributing to a more comprehensive understanding of feature learning in deep learning.
Abstract（参考訳）: ディープラーニング理論では、ニューラルネットワークが階層的特徴をどのように学習するかを理解することが重要な問題である。本研究では,3層ニューラルネットワークを用いた‘textit{multiple linear features’の階層多項式の学習について検討する。 f^{\star}=g^{\star}\circ \bp$ ここで、$\bp:\mathbb{R}^{d} \rightarrow \mathbb{R}^{r}$は$r \ll d$と$g^{\star}:\mathbb{R}^{r}\rightarrow \mathbb{R}$は次数$p$の多項式である。これは、マルチインデックスモデル \citep{damian2022neural} の非線形一般化と見なすことができ、また、単一の非線形特徴、すなわち $r = 1$ \citep{nichani2023provable,wang2023learning} にのみ焦点を絞った以前の研究への拡張と見なすことができる。本研究の主な貢献は, 対象関数 $f^{\star}=g^{\star}\circ \bp$, $f=g\circ \bp$, $\widetilde{\cO}(d^4)$の異なるリンク関数 \end{itemize}, $f=g\circ \bp$, $f=g\circ \bp$, $\widetilde{\cO}(d^4)$ と多項式時間で表される空間の完全回復を, 階層的に勾配降下サフィスを用いて学習した3層ニューラルネットワークである。このような階層的ターゲットに対しては,カーネル手法の複雑さを${\Theta}(d^{2p})$で大幅に改善し,効率的な特徴学習の能力を示す。この結果が,1つの非線形機能のみに依存するモデルだけでなく,単一インデックスモデルや複数インデックスモデルなど,すべての事前設定を克服し,より包括的な機能学習の理解に寄与することが重要である。

関連論文リスト

Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations [40.77319247558742]
目的関数 $f_*:mathbbRdtomathbbR$ を加法構造で学習する際の計算複雑性について検討する。 2層ニューラルネットワークの勾配学習により,$f_*$の大規模なサブセットを効率的に学習できることを実証した。
論文参考訳（メタデータ） (2024-06-17T17:59:17Z)
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
単一インデックス対象関数 $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ の等方的ガウスデータの下で勾配降下学習の問題を考察する。 SGDアルゴリズムで最適化された2層ニューラルネットワークは、サンプル付き任意のリンク関数の$f_*$を学習し、実行時の複雑さは$n asymp T asymp C(q) cdot dであることを示す。
論文参考訳（メタデータ） (2024-06-03T17:56:58Z)
Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
3層ニューラルネットワークを用いた標準ガウス分布における階層関数の学習問題について検討する。次数$k$s$p$の大規模なサブクラスの場合、正方形損失における階層的勾配によるトレーニングを受けた3層ニューラルネットワークは、テストエラーを消すためにターゲット$h$を学習する。この研究は、3層ニューラルネットワークが複雑な特徴を学習し、その結果、幅広い階層関数のクラスを学ぶ能力を示す。
論文参考訳（メタデータ） (2023-11-23T02:19:32Z)
Neural Networks Efficiently Learn Low-Dimensional Representations with SGD [22.703825902761405]
SGDで訓練されたReLU NNは、主方向を回復することで、$y=f(langleboldsymbolu,boldsymbolxrangle) + epsilon$という形の単一インデックスターゲットを学習できることを示す。また、SGDによる近似低ランク構造を用いて、NNに対して圧縮保証を提供する。
論文参考訳（メタデータ） (2022-09-29T15:29:10Z)
Neural Networks can Learn Representations with Gradient Descent [68.95262816363288]
特定の状況下では、勾配降下によって訓練されたニューラルネットワークは、カーネルメソッドのように振る舞う。実際には、ニューラルネットワークが関連するカーネルを強く上回ることが知られている。
論文参考訳（メタデータ） (2022-06-30T09:24:02Z)
Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
2層ニューラルネットワークを学習する際の降下のダイナミクスについて考察する。過度にパラメータ化された2層ニューラルネットワークは、タンジェントサンプルを用いて、ほとんどの地上で勾配損失を許容的に学習できることを示す。
論文参考訳（メタデータ） (2020-07-09T07:09:28Z)
A Corrective View of Neural Networks: Representation, Memorization and Learning [26.87238691716307]
我々はニューラルネットワーク近似の補正機構を開発する。ランダム・フィーチャー・レギュレーション(RF)における2層ニューラルネットワークは任意のラベルを記憶できることを示す。また、3層ニューラルネットワークについても検討し、その補正機構がスムーズなラジアル関数に対する高速な表現率をもたらすことを示す。
論文参考訳（メタデータ） (2020-02-01T20:51:09Z)
Backward Feature Correction: How Deep Learning Performs Deep (Hierarchical) Learning [66.05472746340142]
本稿では,SGD による階層的学習 _efficiently_ と _automatically_ を学習目標として,多層ニューラルネットワークがどのように行うかを分析する。我々は、下位機能のエラーを上位層と共にトレーニングする際に自動的に修正できる"後方特徴補正"と呼ばれる新しい原則を確立する。
論文参考訳（メタデータ） (2020-01-13T17:28:29Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。