Fugu-MT 論文翻訳(概要): Gradient Descent with Projection Finds Over-Parameterized Neural Networks for Learning Low-Degree Polynomials with Nearly Minimax Optimal Rate

論文の概要: Gradient Descent with Projection Finds Over-Parameterized Neural Networks for Learning Low-Degree Polynomials with Nearly Minimax Optimal Rate

arxiv url: http://arxiv.org/abs/2603.21062v1
Date: Sun, 22 Mar 2026 05:06:17 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-24 19:11:39.219444
Title: Gradient Descent with Projection Finds Over-Parameterized Neural Networks for Learning Low-Degree Polynomials with Nearly Minimax Optimal Rate
Title（参考訳）: プロジェクションによるグラディエントDescenceは、極小最適速度で低次多項式を学習するための過度なパラメータ付きニューラルネットワークを発見した
Authors: Yingzhen Yang, Ping Li,
Abstract要約: 本稿では、真次を識別し、ほぼ最適な回帰率を達成する新しい適応度選択アルゴリズムを提案する。我々の結果は、通常のニューラル・タンジェント・カーネル(NTK)限界を超えています。
参考スコア（独自算出の注目度）: 15.975065054204753
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of learning a low-degree spherical polynomial of degree $k_0 = Θ(1) \ge 1$ defined on the unit sphere in $\RR^d$ by training an over-parameterized two-layer neural network with augmented feature in this paper. Our main result is the significantly improved sample complexity for learning such low-degree polynomials. We show that, for any regression risk $\eps \in (0, Θ(d^{-k_0})]$, an over-parameterized two-layer neural network trained by a novel Gradient Descent with Projection (GDP) requires a sample complexity of $n \asymp Θ( \log(4/δ) \cdot d^{k_0}/\eps)$ with probability $1-δ$ for $δ\in (0,1)$, in contrast with the representative sample complexity $Θ(d^{k_0} \max\set{\eps^{-2},\log d})$. Moreover, such sample complexity is nearly unimprovable since the trained network renders a nearly optimal rate of the nonparametric regression risk of the order $\log({4}/δ) \cdot Θ(d^{k_0}/{n})$ with probability at least $1-δ$. On the other hand, the minimax optimal rate for the regression risk with a kernel of rank $Θ(d^{k_0})$ is $Θ(d^{k_0}/{n})$, so that the rate of the nonparametric regression risk of the network trained by GDP is nearly minimax optimal. In the case that the ground truth degree $k_0$ is unknown, we present a novel and provable adaptive degree selection algorithm which identifies the true degree and achieves the same nearly optimal regression rate. To the best of our knowledge, this is the first time that a nearly optimal risk bound is obtained by training an over-parameterized neural network with a popular activation function (ReLU) and algorithmic guarantee for learning low-degree spherical polynomials. Due to the feature learning capability of GDP, our results are beyond the regular Neural Tangent Kernel (NTK) limit.
Abstract（参考訳）: 本稿では,次数$k_0 = s(1) \ge 1$ の低次球面多項式を$\RR^d$ で単位球上に定義する問題について,拡張機能付き過度パラメータ化された2層ニューラルネットワークを訓練することにより検討する。我々の主な成果は、そのような低次多項式を学習する際のサンプルの複雑さが大幅に向上したことである。任意の回帰リスクに対して、$\eps \in (0, )(d^{-k_0})]$は、新しいグラディエントDescent with Projection (GDP) によって訓練された過パラメータ化された2層ニューラルネットワークであり、確率が 1-δ$ for $δ\in (0,1)$ のサンプル複雑性を持つ$n \asymp (4/δ) \cdot d^{k_0}/\eps)$ のサンプル複雑性を必要とする。さらに、トレーニングされたネットワークは、少なくとも1-δ$の確率を持つ位数 $\log({4}/δ) \cdot >(d^{k_0}/{n})$ の非パラメトリック回帰リスクのほぼ最適率を示すので、そのようなサンプルの複雑さはほとんど改善不可能である。一方、リグレッションリスクに対する最小値の最適値は、階数$ (d^{k_0})$ のカーネルが $ (d^{k_0}/{n})$ であるので、GDP でトレーニングされたネットワークの非パラメトリック回帰リスクの比率は、ほぼ最小値である。基本真理次数$k_0$が未知の場合、真理次数を特定し、ほぼ最適な回帰率を達成する新しい適応次数選択アルゴリズムを提案する。我々の知る限りでは、一般的なアクティベーション関数(ReLU)を持つ過パラメータニューラルネットワークをトレーニングし、低次球面多項式を学習するためのアルゴリズム的保証をすることで、ほぼ最適なリスク境界が得られるのは、これが初めてである。 GDPの特徴学習能力のため、我々の結果は通常のニューラルタンジェントカーネル(NTK)限界を超えています。

論文の概要: Gradient Descent with Projection Finds Over-Parameterized Neural Networks for Learning Low-Degree Polynomials with Nearly Minimax Optimal Rate

関連論文リスト