Fugu-MT 論文翻訳(概要): Phenomenology of Double Descent in Finite-Width Neural Networks

論文の概要: Phenomenology of Double Descent in Finite-Width Neural Networks

arxiv url: http://arxiv.org/abs/2203.07337v1
Date: Mon, 14 Mar 2022 17:39:49 GMT
ステータス: 翻訳完了
システム内更新日: 2022-03-15 14:14:23.189675
Title: Phenomenology of Double Descent in Finite-Width Neural Networks
Title（参考訳）: 有限幅ニューラルネットワークにおける二重降下現象
Authors: Sidak Pal Singh, Aurelien Lucchi, Thomas Hofmann, Bernhard Sch\"olkopf
Abstract要約: 二重降下(double descend)は、モデルが属する体制に依存して行動を記述する。我々は影響関数を用いて、人口減少とその下限の適切な表現を導出する。本分析に基づき,損失関数が二重降下に与える影響について検討した。
参考スコア（独自算出の注目度）: 29.119232922018732
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: `Double descent' delineates the generalization behaviour of models depending on the regime they belong to: under- or over-parameterized. The current theoretical understanding behind the occurrence of this phenomenon is primarily based on linear and kernel regression models -- with informal parallels to neural networks via the Neural Tangent Kernel. Therefore such analyses do not adequately capture the mechanisms behind double descent in finite-width neural networks, as well as, disregard crucial components -- such as the choice of the loss function. We address these shortcomings by leveraging influence functions in order to derive suitable expressions of the population loss and its lower bound, while imposing minimal assumptions on the form of the parametric model. Our derived bounds bear an intimate connection with the spectrum of the Hessian at the optimum, and importantly, exhibit a double descent behaviour at the interpolation threshold. Building on our analysis, we further investigate how the loss function affects double descent -- and thus uncover interesting properties of neural networks and their Hessian spectra near the interpolation threshold.
Abstract（参考訳）: ダブル・サブジェクション」は、モデルが属するレジームに応じて、モデルの一般化行動を示す: 過小評価または過大評価。この現象の発生の背後にある現在の理論的理解は主に線形回帰モデルとカーネル回帰モデルに基づいている。 Therefore such analyses do not adequately capture the mechanisms behind double descent in finite-width neural networks, as well as, disregard crucial components -- such as the choice of the loss function. We address these shortcomings by leveraging influence functions in order to derive suitable expressions of the population loss and its lower bound, while imposing minimal assumptions on the form of the parametric model. Our derived bounds bear an intimate connection with the spectrum of the Hessian at the optimum, and importantly, exhibit a double descent behaviour at the interpolation threshold. Building on our analysis, we further investigate how the loss function affects double descent -- and thus uncover interesting properties of neural networks and their Hessian spectra near the interpolation threshold.

関連論文リスト

The Spectral Bias of Shallow Neural Network Learning is Shaped by the Choice of Non-linearity [0.7499722271664144]
非線形活性化関数がニューラルネットワークの暗黙バイアスの形成にどのように寄与するかを考察する。局所的動的誘引器は、ニューロンの活性化関数への入力がゼロとなる超平面のクラスターの形成を促進することを示す。
論文参考訳（メタデータ） (2025-03-13T17:36:46Z)
Convex Analysis of the Mean Field Langevin Dynamics [49.66486092259375]
平均場ランゲヴィン力学の収束速度解析について述べる。ダイナミックスに付随する$p_q$により、凸最適化において古典的な結果と平行な収束理論を開発できる。
論文参考訳（メタデータ） (2022-01-25T17:13:56Z)
Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
一般化誤差の「二重降下」現象について検討する。二重降下は、異なるスケールで学習される異なる特徴に起因する可能性がある。
論文参考訳（メタデータ） (2021-12-06T18:17:08Z)
The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks [51.1848572349154]
ノイズの多いデータに完全に適合するニューラルネットワークモデルは、見当たらないテストデータにうまく一般化できる。我々は,2層線形ニューラルネットワークを2乗損失の勾配流で補間し,余剰リスクを導出する。
論文参考訳（メタデータ） (2021-08-25T22:01:01Z)
Linear approximability of two-layer neural networks: A comprehensive analysis based on spectral decay [4.042159113348107]
まず、単一ニューロンの場合について考察し、コルモゴロフ幅で定量化される線形近似性は、共役核の固有値崩壊によって制御されることを示す。また,2層ニューラルネットワークについても同様の結果が得られた。
論文参考訳（メタデータ） (2021-08-10T23:30:29Z)
Nonasymptotic theory for two-layer neural networks: Beyond the bias-variance trade-off [10.182922771556742]
本稿では,ReLUアクティベーション機能を持つ2層ニューラルネットワークに対する漸近的一般化理論を提案する。過度にパラメータ化されたランダムな特徴モデルは次元性の呪いに悩まされ、従って準最適であることを示す。
論文参考訳（メタデータ） (2021-06-09T03:52:18Z)
Optimizing Mode Connectivity via Neuron Alignment [84.26606622400423]
経験的に、損失関数の局所ミニマは、損失がほぼ一定であるようなモデル空間の学習曲線で接続することができる。本稿では,ネットワークの重み変化を考慮し,対称性がランドスケープ・コネクティビティに与える影響を明らかにするための,より一般的な枠組みを提案する。
論文参考訳（メタデータ） (2020-09-05T02:25:23Z)
The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization [34.235007566913396]
現代のディープラーニングモデルでは、トレーニングデータに適合するために必要なパラメータよりもはるかに多くのパラメータが採用されている。この予期せぬ振る舞いを記述するための新たなパラダイムは、エンファンダブル降下曲線(英語版)である。本稿では,勾配降下を伴う広帯域ニューラルネットワークの挙動を特徴付けるニューラル・タンジェント・カーネルを用いた一般化の高精度な高次元解析を行う。
論文参考訳（メタデータ） (2020-08-15T20:55:40Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
一般化構造方程式モデル(SEM)のクラスにおける推定について検討する。線形作用素方程式をmin-maxゲームとして定式化し、ニューラルネットワーク(NN)でパラメータ化し、勾配勾配を用いてニューラルネットワークのパラメータを学習する。提案手法は,サンプル分割を必要とせず,確固とした収束性を持つNNをベースとしたSEMの抽出可能な推定手順を初めて提供する。
論文参考訳（メタデータ） (2020-07-02T17:55:47Z)
Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss [0.0]
勾配に基づく手法によるロジスティック(クロスエントロピー)損失を最小限に抑えるために訓練されたニューラルネットワークは、多くの教師付き分類タスクでうまく機能する。我々は、均一な活性化を伴う無限に広い2層ニューラルネットワークのトレーニングと一般化の挙動を解析する。
論文参考訳（メタデータ） (2020-02-11T15:42:09Z)
A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks [87.23360438947114]
重み劣化を伴う雑音勾配降下は依然として「カーネル様」の挙動を示すことを示す。これは、トレーニング損失が一定の精度まで線形に収束することを意味する。また,重み劣化を伴う雑音勾配勾配勾配で学習した2層ニューラルネットワークに対して,新しい一般化誤差を確立する。
論文参考訳（メタデータ） (2020-02-10T18:56:15Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。