Fugu-MT 論文翻訳(概要): Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials

論文の概要: Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials

arxiv url: http://arxiv.org/abs/2206.03688v1
Date: Wed, 8 Jun 2022 06:06:51 GMT
ステータス: 翻訳完了
システム内更新日: 2022-06-09 13:21:00.226833
Title: Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials
Title（参考訳）: NTK体制を脱却し、低次+スパース多項式を効率的に学習するための正しい方向の同定
Authors: Eshaan Nichani, Yu Bai, Jason D. Lee
Abstract要約: 広帯域2層ニューラルネットワークはターゲット関数に適合するためにTangent Kernel(NTK)とQuadNTKを併用可能であることを示す。これにより、終端収束が得られ、NTKとQuadNTKの双方に対して証明可能なサンプル改善が保証される。
参考スコア（独自算出の注目度）: 52.11466135206223
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A recent goal in the theory of deep learning is to identify how neural networks can escape the "lazy training," or Neural Tangent Kernel (NTK) regime, where the network is coupled with its first order Taylor expansion at initialization. While the NTK is minimax optimal for learning dense polynomials (Ghorbani et al, 2021), it cannot learn features, and hence has poor sample complexity for learning many classes of functions including sparse polynomials. Recent works have thus aimed to identify settings where gradient based algorithms provably generalize better than the NTK. One such example is the "QuadNTK" approach of Bai and Lee (2020), which analyzes the second-order term in the Taylor expansion. Bai and Lee (2020) show that the second-order term can learn sparse polynomials efficiently; however, it sacrifices the ability to learn general dense polynomials. In this paper, we analyze how gradient descent on a two-layer neural network can escape the NTK regime by utilizing a spectral characterization of the NTK (Montanari and Zhong, 2020) and building on the QuadNTK approach. We first expand upon the spectral analysis to identify "good" directions in parameter space in which we can move without harming generalization. Next, we show that a wide two-layer neural network can jointly use the NTK and QuadNTK to fit target functions consisting of a dense low-degree term and a sparse high-degree term -- something neither the NTK nor the QuadNTK can do on their own. Finally, we construct a regularizer which encourages our parameter vector to move in the "good" directions, and show that gradient descent on the regularized loss will converge to a global minimizer, which also has low test error. This yields an end to end convergence and generalization guarantee with provable sample complexity improvement over both the NTK and QuadNTK on their own.
Abstract（参考訳）: 近年のディープラーニング理論の目標は、ニューラルネットワークが"怠慢なトレーニング"(Neural Tangent Kernel(NTK)体制から抜け出す方法を特定することだ。 NTKは高密度多項式の学習に最適であるが(Ghorbani et al, 2021)、特徴を学習することはできず、スパース多項式を含む多くの関数のクラスを学ぶにはサンプルの複雑さが低い。このため、最近の研究は、勾配に基づくアルゴリズムがNTKよりも確実に一般化した設定を特定することを目的としている。そのような例として、Bai and Lee (2020) の "QuadNTK" アプローチがあり、テイラー展開の2階項を分析する。 Bai and Lee (2020) は、2階項がスパース多項式を効率的に学習できることを示したが、一般の高次多項式を学習する能力は犠牲になる。本稿では,NTK(Montanari and Zhong, 2020)のスペクトル特性とQuadNTKアプローチに基づく構築を利用して,2層ニューラルネットワークの勾配降下がNTK体制から逃れる方法について分析する。まずスペクトル解析を行い、一般化を損なうことなく移動できるパラメータ空間の「良い」方向を特定する。次に、広帯域の2層ニューラルネットワークは、NTKとQuadNTKを併用して、密度の高い低次項と疎度の高次項からなるターゲット関数に適合させることができることを示す。最後に、パラメータベクトルを「よい」方向に移動するように促す正規化器を構築し、正規化損失の勾配勾配が、テストエラーの少ない大域最小化器に収束することを示す。これにより、NTK と QuadNTK の双方に対して、証明可能なサンプル複雑性の改善とともに、終端収束と一般化を保証する。

論文の概要: Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials

関連論文リスト