Fugu-MT 論文翻訳(概要): Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle

論文の概要: Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle

arxiv url: http://arxiv.org/abs/2303.14151v1
Date: Fri, 24 Mar 2023 17:03:40 GMT
ステータス: 翻訳完了
システム内更新日: 2023-03-27 13:33:42.566982
Title: Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle
Title（参考訳）: double descent demystified: 深層学習パズルの源を同定、解釈、補間する
Authors: Rylan Schaeffer, Mikail Khona, Zachary Robertson, Akhilan Boopathy, Kateryna Pistunova, Jason W. Rocks, Ila Rani Fiete, Oluwasanmi Koyejo
Abstract要約: 二重降下は機械学習の驚くべき現象である。データ数に対してモデルパラメータの数が増加するにつれて、テストエラーは減少する。
参考スコア（独自算出の注目度）: 12.00962791565144
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Double descent is a surprising phenomenon in machine learning, in which as the number of model parameters grows relative to the number of data, test error drops as models grow ever larger into the highly overparameterized (data undersampled) regime. This drop in test error flies against classical learning theory on overfitting and has arguably underpinned the success of large models in machine learning. This non-monotonic behavior of test loss depends on the number of data, the dimensionality of the data and the number of model parameters. Here, we briefly describe double descent, then provide an explanation of why double descent occurs in an informal and approachable manner, requiring only familiarity with linear algebra and introductory probability. We provide visual intuition using polynomial regression, then mathematically analyze double descent with ordinary linear regression and identify three interpretable factors that, when simultaneously all present, together create double descent. We demonstrate that double descent occurs on real data when using ordinary linear regression, then demonstrate that double descent does not occur when any of the three factors are ablated. We use this understanding to shed light on recent observations in nonlinear models concerning superposition and double descent. Code is publicly available.
Abstract（参考訳）: ダブル降下は機械学習において驚くべき現象であり、モデルパラメータ数がデータ数に対して増加するにつれて、モデルが大きくなり、テストエラーが減少し、高度に過大にパラメータ化(データサンプル化)される。このテストエラーの減少は、オーバーフィッティングに関する古典的な学習理論に反し、機械学習における大きなモデルの成功を暗示している。このテスト損失の非単調な振る舞いは、データの数、データの次元性、モデルパラメータの数に依存する。ここでは、二重降下を簡潔に記述し、なぜ二重降下が非公式で接近可能な方法で起こるのかを説明し、線型代数と導入確率にのみ親しむ必要がある。多項式回帰を用いた視覚的直観を提供し、次に通常の線形回帰を用いて2重降下を数学的に解析し、同時に3つの解釈可能な因子を同定する。通常の線形回帰を用いた場合, 2重降下は実データ上で起こることを実証し, いずれかの因子が崩壊しても2重降下は起こらないことを示した。重ね合わせと二重降下に関する非線形モデルにおける最近の観測に光を当てるために、この理解を用いる。コードは公開されている。

関連論文リスト

eGAD! double descent is explained by Generalized Aliasing Decomposition [0.0]
本稿では,予測性能とモデル複雑性の関係を説明するため,GAD(Generalized Aliasing decomposition)と呼ぶ新しい分解法を提案する。 GADは予測誤差を3つの部分に分解する: 1) パラメータの数がデータポイントの数よりもはるかに小さい場合に支配するモデル不整合、2) データ不整合、3) パラメータの数がデータポイントの数よりはるかに大きいときに支配するデータ不整合、3) 一般化エイリアス。
論文参考訳（メタデータ） (2024-08-15T17:49:24Z)
Towards understanding epoch-wise double descent in two-layer linear neural networks [11.210628847081097]
2層線形ニューラルネットワークにおけるエポックワイズ二重降下について検討した。余剰モデル層で出現するエポックな2重降下の要因を同定した。これは真に深いモデルに対するエポックワイズ二重降下の未同定因子に関するさらなる疑問を提起する。
論文参考訳（メタデータ） (2024-07-13T10:45:21Z)
Understanding the Double Descent Phenomenon in Deep Learning [49.1574468325115]
このチュートリアルは、古典的な統計学習の枠組みを設定し、二重降下現象を導入する。いくつかの例を見て、セクション2では、二重降下において重要な役割を果たすと思われる帰納的バイアスを導入している。第3節は2つの線形モデルで二重降下を探索し、最近の関連する研究から他の視点を提供する。
論文参考訳（メタデータ） (2024-03-15T16:51:24Z)
A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning [68.76846801719095]
二重降下がいつどこで起こるのかを正確に示し、その位置が本質的に閾値 p=n に結び付けられていないことを示す。これは二重降下と統計的直観の間の緊張を解消する。
論文参考訳（メタデータ） (2023-10-29T12:05:39Z)
Analysis of Interpolating Regression Models and the Double Descent Phenomenon [3.883460584034765]
ノイズの多いトレーニングデータを補間するモデルは、一般化に乏しいと一般的に推測されている。得られた最良のモデルは過度にパラメータ化され、テストエラーはモデル順序が増加するにつれて二重降下挙動を示す。回帰行列の最小特異値の振舞いに基づいて、テスト誤差のピーク位置と二重降下形状をモデル順序の関数として説明する。
論文参考訳（メタデータ） (2023-04-17T09:44:33Z)
What learning algorithm is in-context learning? Investigations with linear models [87.91612418166464]
本稿では,トランスフォーマーに基づくインコンテキスト学習者が標準学習アルゴリズムを暗黙的に実装する仮説について検討する。訓練された文脈内学習者は、勾配降下、隆起回帰、および正確な最小二乗回帰によって計算された予測値と密に一致していることを示す。文脈内学習者がこれらの予測器とアルゴリズム的特徴を共有するという予備的証拠。
論文参考訳（メタデータ） (2022-11-28T18:59:51Z)
Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
一般化誤差の「二重降下」現象について検討する。二重降下は、異なるスケールで学習される異なる特徴に起因する可能性がある。
論文参考訳（メタデータ） (2021-12-06T18:17:08Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
現代の機械学習モデルは、しばしば膨大な数のパラメータを使用し、通常、トレーニング損失がゼロになるように最適化されている。ニューラルネットワークの2層構成において、これらの良質な過適合現象がどのように起こるかを検討する。本稿では,2層型ReLUネットワーク補間器を極小最適学習率で実現可能であることを示す。
論文参考訳（メタデータ） (2021-06-06T19:08:53Z)
Optimization Variance: Exploring Generalization Properties of DNNs [83.78477167211315]
ディープニューラルネットワーク(DNN)のテストエラーは、しばしば二重降下を示す。そこで本研究では,モデル更新の多様性を測定するために,新しい測度である最適化分散(OV)を提案する。
論文参考訳（メタデータ） (2021-06-03T09:34:17Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。