Fugu-MT 論文翻訳(概要): Deep regularization and direct training of the inner layers of Neural Networks with Kernel Flows

論文の概要: Deep regularization and direct training of the inner layers of Neural Networks with Kernel Flows

arxiv url: http://arxiv.org/abs/2002.08335v2
Date: Fri, 7 Aug 2020 03:47:26 GMT
ステータス: 翻訳完了
システム内更新日: 2022-12-30 13:19:16.890649
Title: Deep regularization and direct training of the inner layers of Neural Networks with Kernel Flows
Title（参考訳）: 核流を伴うニューラルネットワークの内部層の深部正規化と直接学習
Authors: Gene Ryan Yoo and Houman Owhadi
Abstract要約: カーネルフロー(KF)に基づくニューラルネットワーク(ANN)の新しい正規化手法を提案する。 KFは、データセットのランダムバッチのポイント数を半分にすることで得られる精度の損失を最小限に抑え、回帰/クリギングにおけるカーネル選択の方法として導入された。
参考スコア（独自算出の注目度）: 0.609170287691728
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a new regularization method for Artificial Neural Networks (ANNs) based on Kernel Flows (KFs). KFs were introduced as a method for kernel selection in regression/kriging based on the minimization of the loss of accuracy incurred by halving the number of interpolation points in random batches of the dataset. Writing $f_\theta(x) = \big(f^{(n)}_{\theta_n}\circ f^{(n-1)}_{\theta_{n-1}} \circ \dots \circ f^{(1)}_{\theta_1}\big)(x)$ for the functional representation of compositional structure of the ANN, the inner layers outputs $h^{(i)}(x) = \big(f^{(i)}_{\theta_i}\circ f^{(i-1)}_{\theta_{i-1}} \circ \dots \circ f^{(1)}_{\theta_1}\big)(x)$ define a hierarchy of feature maps and kernels $k^{(i)}(x,x')=\exp(- \gamma_i \|h^{(i)}(x)-h^{(i)}(x')\|_2^2)$. When combined with a batch of the dataset these kernels produce KF losses $e_2^{(i)}$ (the $L^2$ regression error incurred by using a random half of the batch to predict the other half) depending on parameters of inner layers $\theta_1,\ldots,\theta_i$ (and $\gamma_i$). The proposed method simply consists in aggregating a subset of these KF losses with a classical output loss. We test the proposed method on CNNs and WRNs without alteration of structure nor output classifier and report reduced test errors, decreased generalization gaps, and increased robustness to distribution shift without significant increase in computational complexity. We suspect that these results might be explained by the fact that while conventional training only employs a linear functional (a generalized moment) of the empirical distribution defined by the dataset and can be prone to trapping in the Neural Tangent Kernel regime (under over-parameterizations), the proposed loss function (defined as a nonlinear functional of the empirical distribution) effectively trains the underlying kernel defined by the CNN beyond regressing the data with that kernel.
Abstract（参考訳）: 本稿では,Kernel Flows(KFs)に基づくニューラルネットワーク(ANNs)の新しい正規化手法を提案する。 KFは、データセットのランダムバッチにおける補間点数を半分にすることで得られた精度損失の最小化に基づいて、回帰/クリギングにおけるカーネルの選択方法として導入された。 f_\theta(x) = \big(f^{(n)}_{\theta_n}\circ f^{(n-1)}_{\theta_{n-1}} \circ \dots \circ f^{(1)}_{\theta_1}\big(x)$をANNの構成構造の関数表現として書くと、内部層は$h^{(i)}(x) = \big(f^{(i)_{\theta_i}\circ f^{(i-1)}_{\theta_{i-1}} \circ \dots \circ f^{(1)}_{\theta_1}\big(x)$を出力する。データセットのバッチと組み合わせると、これらのカーネルはKF損失$e_2^{(i)}$(バッチのランダムな半分を使って残りの半分を予測する$L^2$の回帰エラー)を内部層$\theta_1,\ldots,\theta_i$(および$\gamma_i$)のパラメータによって生成する。提案手法は,これらのkf損失のサブセットを古典的出力損失で集約するものである。提案手法は,構造や出力分類器を変更せずにcnnとwrnでテストし,テスト誤差の低減,一般化ギャップの低減,分散シフトに対するロバスト性の向上を,計算量の増加を伴わずに検証した。これらの結果は、従来のトレーニングではデータセットで定義された経験的分布の線形汎関数(一般化されたモーメント)のみを使用しており、(過剰パラメータ化の下で)神経接核系にトラップしやすいという事実によって説明される可能性があるが、提案された損失関数(経験的分布の非線形汎関数として定義される)は、cnnが定義する基礎となるカーネルを、そのカーネルでリグレッシブする余地なく効果的に訓練する。

関連論文リスト

Enhanced Feature Learning via Regularisation: Integrating Neural Networks and Kernel Methods [0.0]
我々はBrownian Kernel Neural Network (BKerNN) と呼ばれる推定器の効率的な手法を提案する。 BKerNNの予測リスクは、O(min((d/n)1/2, n-1/6)$(対数因子まで)の明示的な高い確率で最小限のリスクに収束することを示す。
論文参考訳（メタデータ） (2024-07-24T13:46:50Z)
Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes [29.466981306355066]
固定学習率$eta$の勾配降下はスムーズな関数を表す局所最小値しか見つからないことを示す。また、$n$のデータポイントのサポートの厳密な内部で、$widetildeO(n-4/5)$のほぼ最適MSE境界を証明します。
論文参考訳（メタデータ） (2024-06-10T22:57:27Z)
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
単一インデックス対象関数 $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ の勾配勾配勾配学習問題について検討する。 SGDに基づくアルゴリズムにより最適化された2層ニューラルネットワークは、情報指数に支配されない複雑さで$f_*$を学習する。
論文参考訳（メタデータ） (2024-06-03T17:56:58Z)
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
近年の研究では、再生カーネルヒルベルト空間(RKHS)がニューラルネットワークによる関数のモデル化に適した空間ではないことが示されている。本稿では,有界ノルムを持つオーバーパラメータ化された2層ニューラルネットワークに適した関数空間について検討する。
論文参考訳（メタデータ） (2024-04-29T15:04:07Z)
SKI to go Faster: Accelerating Toeplitz Neural Networks via Asymmetric Kernels [69.47358238222586]
Toeplitz Neural Networks (TNN) は、印象的な結果を持つ最近のシーケンスモデルである。我々は, O(n) 計算複雑性と O(n) 相対位置エンコーダ (RPE) 多層パーセプトロン (MLP) と減衰バイアスコールの低減を目指す。双方向モデルの場合、これはスパースと低ランクのToeplitz行列分解を動機付ける。
論文参考訳（メタデータ） (2023-05-15T21:25:35Z)
Generalization and Stability of Interpolating Neural Networks with Minimal Width [37.908159361149835]
補間系における勾配によって訓練された浅層ニューラルネットワークの一般化と最適化について検討する。トレーニング損失数は$m=Omega(log4 (n))$ニューロンとニューロンを最小化する。 m=Omega(log4 (n))$のニューロンと$Tapprox n$で、テスト損失のトレーニングを$tildeO (1/)$に制限します。
論文参考訳（メタデータ） (2023-02-18T05:06:15Z)
Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
本稿では,ニューラルネットワークガウス過程(NNGP)カーネルのランダム特徴近似(RFA)を用いた新しいアルゴリズムを提案する。我々のアルゴリズムは、KIP上で少なくとも100倍のスピードアップを提供し、1つのGPUで実行できる。 RFA蒸留 (RFAD) と呼ばれる本手法は, 大規模データセットの精度において, KIP や他のデータセット凝縮アルゴリズムと競合して動作する。
論文参考訳（メタデータ） (2022-10-21T15:56:13Z)
Bounding the Width of Neural Networks via Coupled Initialization -- A Worst Case Analysis [121.9821494461427]
2層ReLUネットワークに必要なニューロン数を著しく削減する方法を示す。また、事前の作業を改善するための新しい下位境界を証明し、ある仮定の下では、最善を尽くすことができることを証明します。
論文参考訳（メタデータ） (2022-06-26T06:51:31Z)
Deformed semicircle law and concentration of nonlinear random matrices for ultra-wide neural networks [29.03095282348978]
本稿では、$f(X)$に付随する2つの経験的カーネル行列のスペクトル分布の制限について検討する。経験的カーネルによって誘導されるランダムな特徴回帰は、超広範体制下でのカーネル回帰の制限と同じ性能を達成することを示す。
論文参考訳（メタデータ） (2021-09-20T05:25:52Z)
Beyond Lazy Training for Over-parameterized Tensor Decomposition [69.4699995828506]
過度なパラメータ化対象の勾配勾配は遅延学習体制を超え、データ中の特定の低ランク構造を利用する可能性があることを示す。以上の結果から,過パラメータ化対象の勾配勾配は遅延学習体制を超え,データ中の特定の低ランク構造を利用する可能性が示唆された。
論文参考訳（メタデータ） (2020-10-22T00:32:12Z)
The Interpolation Phase Transition in Neural Networks: Memorization and Generalization under Lazy Training [10.72393527290646]
ニューラル・タンジェント(NT)体制における2層ニューラルネットワークの文脈における現象について検討した。 Ndgg n$ とすると、テストエラーは無限幅のカーネルに対するカーネルリッジ回帰の1つによってよく近似される。後者は誤差リッジ回帰によりよく近似され、活性化関数の高次成分に関連する自己誘導項により正規化パラメータが増加する。
論文参考訳（メタデータ） (2020-07-25T01:51:13Z)
Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
2層ニューラルネットワークを学習する際の降下のダイナミクスについて考察する。過度にパラメータ化された2層ニューラルネットワークは、タンジェントサンプルを用いて、ほとんどの地上で勾配損失を許容的に学習できることを示す。
論文参考訳（メタデータ） (2020-07-09T07:09:28Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。