Fugu-MT 論文翻訳(概要): Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit

論文の概要: Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit

arxiv url: http://arxiv.org/abs/2511.15120v1
Date: Wed, 19 Nov 2025 04:46:47 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-20 15:51:28.637861
Title: Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit
Title（参考訳）: ニューラルネットワークは情報理論限界近傍の汎用マルチインデックスモデルを学習する
Authors: Bohan Zhang, Zihao Wang, Hengyu Fu, Jason D. Lee,
Abstract要約: 一般ガウス多次元モデル $f(boldsymbolx)=g(boldsymbolUboldsymbolx)$ の勾配降下学習を隠蔽部分空間 $boldsymbolUin mathbbRrtimes d$ で研究する。リンク関数上の一般的な非退化仮定の下では、層次勾配勾配勾配によって訓練された標準的な2層ニューラルネットワークは、$o_d(1)$テスト誤差でターゲットを不可知的に学習できることを示す。
参考スコア（独自算出の注目度）: 66.20349460098275
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In deep learning, a central issue is to understand how neural networks efficiently learn high-dimensional features. To this end, we explore the gradient descent learning of a general Gaussian Multi-index model $f(\boldsymbol{x})=g(\boldsymbol{U}\boldsymbol{x})$ with hidden subspace $\boldsymbol{U}\in \mathbb{R}^{r\times d}$, which is the canonical setup to study representation learning. We prove that under generic non-degenerate assumptions on the link function, a standard two-layer neural network trained via layer-wise gradient descent can agnostically learn the target with $o_d(1)$ test error using $\widetilde{\mathcal{O}}(d)$ samples and $\widetilde{\mathcal{O}}(d^2)$ time. The sample and time complexity both align with the information-theoretic limit up to leading order and are therefore optimal. During the first stage of gradient descent learning, the proof proceeds via showing that the inner weights can perform a power-iteration process. This process implicitly mimics a spectral start for the whole span of the hidden subspace and eventually eliminates finite-sample noise and recovers this span. It surprisingly indicates that optimal results can only be achieved if the first layer is trained for more than $\mathcal{O}(1)$ steps. This work demonstrates the ability of neural networks to effectively learn hierarchical functions with respect to both sample and time efficiency.
Abstract（参考訳）: ディープラーニングでは、ニューラルネットワークが高次元の特徴を効率的に学習する方法を理解することが中心的な課題である。この目的のために、一般ガウス多指標モデル $f(\boldsymbol{x})=g(\boldsymbol{U}\boldsymbol{x})$ の勾配勾配勾配学習を隠蔽部分空間 $\boldsymbol{U}\in \mathbb{R}^{r\times d}$ で探索する。リンク関数上の一般的な非退化仮定の下では、層次勾配勾配勾配でトレーニングされた標準の2層ニューラルネットワークは、$\widetilde{\mathcal{O}}(d)$サンプルと$\widetilde{\mathcal{O}}(d^2)$時間を用いて、ターゲットを不特定に学習することができることを証明した。サンプルと時間の複雑さは、情報理論の限界を最優先の順序まで満たし、したがって最適である。勾配降下学習の第1段階において、この証明は内重みがパワーイテレーションプロセスを実行可能であることを示すことによって進行する。この過程は隠れた部分空間全体のスペクトル開始を暗黙的に模倣し、最終的には有限サンプルノイズを除去し、このスパンを回復する。これは、第1の層が$\mathcal{O}(1)$のステップでトレーニングされた場合にのみ最適な結果が得られることを驚くほど示している。この研究は、サンプルと時間効率の両方に関して、ニューラルネットワークが階層関数を効果的に学習する能力を示す。

論文の概要: Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit

関連論文リスト