Fugu-MT 論文翻訳(概要): Learning Hierarchical Polynomials with Three-Layer Neural Networks

論文の概要: Learning Hierarchical Polynomials with Three-Layer Neural Networks

arxiv url: http://arxiv.org/abs/2311.13774v1
Date: Thu, 23 Nov 2023 02:19:32 GMT
ステータス: 翻訳完了
システム内更新日: 2023-11-28 00:44:34.410872
Title: Learning Hierarchical Polynomials with Three-Layer Neural Networks
Title（参考訳）: 3層ニューラルネットワークによる階層多項式の学習
Authors: Zihao Wang, Eshaan Nichani, Jason D. Lee
Abstract要約: 3層ニューラルネットワークを用いた標準ガウス分布における階層関数の学習問題について検討する。次数$k$s$p$の大規模なサブクラスの場合、正方形損失における階層的勾配によるトレーニングを受けた3層ニューラルネットワークは、テストエラーを消すためにターゲット$h$を学習する。この研究は、3層ニューラルネットワークが複雑な特徴を学習し、その結果、幅広い階層関数のクラスを学ぶ能力を示す。
参考スコア（独自算出の注目度）: 56.71223169861528
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of learning hierarchical polynomials over the standard Gaussian distribution with three-layer neural networks. We specifically consider target functions of the form $h = g \circ p$ where $p : \mathbb{R}^d \rightarrow \mathbb{R}$ is a degree $k$ polynomial and $g: \mathbb{R} \rightarrow \mathbb{R}$ is a degree $q$ polynomial. This function class generalizes the single-index model, which corresponds to $k=1$, and is a natural class of functions possessing an underlying hierarchical structure. Our main result shows that for a large subclass of degree $k$ polynomials $p$, a three-layer neural network trained via layerwise gradient descent on the square loss learns the target $h$ up to vanishing test error in $\widetilde{\mathcal{O}}(d^k)$ samples and polynomial time. This is a strict improvement over kernel methods, which require $\widetilde \Theta(d^{kq})$ samples, as well as existing guarantees for two-layer networks, which require the target function to be low-rank. Our result also generalizes prior works on three-layer neural networks, which were restricted to the case of $p$ being a quadratic. When $p$ is indeed a quadratic, we achieve the information-theoretically optimal sample complexity $\widetilde{\mathcal{O}}(d^2)$, which is an improvement over prior work~\citep{nichani2023provable} requiring a sample size of $\widetilde\Theta(d^4)$. Our proof proceeds by showing that during the initial stage of training the network performs feature learning to recover the feature $p$ with $\widetilde{\mathcal{O}}(d^k)$ samples. This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
Abstract（参考訳）: 3層ニューラルネットワークを用いた標準ガウス分布における階層多項式の学習問題について検討する。ここで、$p : \mathbb{r}^d \rightarrow \mathbb{r}$ は次数 $k$ 多項式であり、$g: \mathbb{r} \rightarrow \mathbb{r}$ は次数 $q$ 多項式である。この関数クラスは、$k=1$に対応する単一インデックスモデルを一般化し、基礎となる階層構造を持つ関数の自然なクラスである。我々の主な結果は、次数$k$多項式の大規模サブクラス$p$に対して、正方形損失の層次勾配降下によってトレーニングされた3層ニューラルネットワークは、$\widetilde{\mathcal{O}}(d^k)$サンプルと多項式時間でテストエラーを消すための目標$h$を学習することを示している。これはカーネルメソッドに対する厳格な改善であり、$\widetilde \theta(d^{kq})$サンプルと、ターゲット関数を低ランクで要求する2層ネットワークに対する既存の保証が必要である。また,3層ニューラルネットワークに関する先行研究を一般化し,これを2次ニューラルネットワークである$p$に制限した。実際に$p$が二次であるとき、情報理論上最適なサンプル複雑性 $\widetilde{\mathcal{O}}(d^2)$ が得られ、これは以前の作業よりも改善され、サンプルサイズが$\widetilde\Theta(d^4)$ となる。我々の証明は、トレーニングの初期段階において、ネットワークが機能学習を行い、$\widetilde{\mathcal{O}}(d^k)$サンプルで$$p$の機能を回復することを示す。この研究は、複雑な特徴を学習する3層ニューラルネットワークの能力を示し、その結果、階層関数の幅広いクラスを学習する。

論文の概要: Learning Hierarchical Polynomials with Three-Layer Neural Networks

関連論文リスト