Fugu-MT 論文翻訳(概要): Comparing Classes of Estimators: When does Gradient Descent Beat Ridge Regression in Linear Models?

論文の概要: Comparing Classes of Estimators: When does Gradient Descent Beat Ridge Regression in Linear Models?

arxiv url: http://arxiv.org/abs/2108.11872v1
Date: Thu, 26 Aug 2021 16:01:37 GMT
ステータス: 翻訳完了
システム内更新日: 2021-08-27 17:09:23.074927
Title: Comparing Classes of Estimators: When does Gradient Descent Beat Ridge Regression in Linear Models?
Title（参考訳）: エスペクタのクラスの比較:線形モデルにおける勾配降下がリッジ回帰を上回ったのはいつか?
Authors: Dominic Richards, Edgar Dobriban, Patrick Rebeschini
Abstract要約: クラス内のEmphbestメソッドの相対的性能による推定器のクラスの比較を行う。これにより、学習アルゴリズムのチューニング感度を厳格に定量化できます。
参考スコア（独自算出の注目度）: 46.01087792062936
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Modern methods for learning from data depend on many tuning parameters, such as the stepsize for optimization methods, and the regularization strength for regularized learning methods. Since performance can depend strongly on these parameters, it is important to develop comparisons between \emph{classes of methods}, not just for particularly tuned ones. Here, we take aim to compare classes of estimators via the relative performance of the \emph{best method in the class}. This allows us to rigorously quantify the tuning sensitivity of learning algorithms. As an illustration, we investigate the statistical estimation performance of ridge regression with a uniform grid of regularization parameters, and of gradient descent iterates with a fixed stepsize, in the standard linear model with a random isotropic ground truth parameter. (1) For orthogonal designs, we find the \emph{exact minimax optimal classes of estimators}, showing they are equal to gradient descent with a polynomially decaying learning rate. We find the exact suboptimalities of ridge regression and gradient descent with a fixed stepsize, showing that they decay as either $1/k$ or $1/k^2$ for specific ranges of $k$ estimators. (2) For general designs with a large number of non-zero eigenvalues, we find that gradient descent outperforms ridge regression when the eigenvalues decay slowly, as a power law with exponent less than unity. If instead the eigenvalues decay quickly, as a power law with exponent greater than unity or exponentially, we find that ridge regression outperforms gradient descent. Our results highlight the importance of tuning parameters. In particular, while optimally tuned ridge regression is the best estimator in our case, it can be outperformed by gradient descent when both are restricted to being tuned over a finite regularization grid.
Abstract（参考訳）: データから学習する現代の方法は、最適化方法のステップライズや正規化学習方法の正規化強度など、多くのチューニングパラメータに依存する。性能はこれらのパラメータに強く依存するため、特に調整されたパラメータだけでなく、メソッドのemph{classes of Method}の比較を開発することが重要である。ここでは,クラス内の \emph{best メソッドの相対的性能を用いて推定器のクラスを比較する。これにより、学習アルゴリズムのチューニング感度を厳密に定量化できます。本研究では,ランダム等方的地盤真理パラメータを持つ標準線形モデルにおいて,正則化パラメータの均一格子によるリッジ回帰と定段化による勾配降下の統計的推定性能について検討した。 1)直交設計については,emph{exact minimax optimal class of estimators} が多項式減衰学習率の勾配降下に等しいことを示す。リッジ回帰と勾配降下の正確な準最適性は一定ステップで示され、特定の範囲で1/k$または1/k^2$で崩壊することを示している。 2) 非零固有値が多数ある一般設計では, 勾配降下は, 固有値が緩やかに減衰するときにリッジ回帰よりも, 指数が一乗よりも小さい力則として優れる。代わりに固有値が急速に減衰した場合、指数法則がユニティよりも大きいか指数関数的に大きい場合、リッジ回帰は勾配勾配よりも優れる。この結果は、チューニングパラメータの重要性を強調します。特に、最適に調整されたリッジ回帰は、我々の場合において最良の推定量であるが、有限正規化格子上のチューニングに制限された場合、勾配降下により性能が向上する。

関連論文リスト

Gradient-free stochastic optimization for additive models [56.42455605591779]
本稿では,Polyak-Lojasiewicz あるいは強凸条件を満たす目的関数に対する雑音観測によるゼロ次最適化の問題に対処する。対象関数は加法的構造を持ち、H"古い関数族によって特徴づけられる高次滑らか性特性を満たすと仮定する。
論文参考訳（メタデータ） (2025-03-03T23:39:08Z)
Stagewise Boosting Distributional Regression [0.0]
本稿では,分布回帰のための段階的ブースティング型アルゴリズムを提案する。新たな正則化手法である相関フィルタリングを用いて拡張し,さらなる安定性を実現する。大規模なデータセットを処理するメリットに加えて、近似の性質はより良い結果をもたらす可能性がある。
論文参考訳（メタデータ） (2024-05-28T15:40:39Z)
ELRA: Exponential learning rate adaption gradient descent optimization method [83.88591755871734]
我々は, 高速(指数率), ab initio(超自由)勾配に基づく適応法を提案する。本手法の主な考え方は,状況認識による$alphaの適応である。これは任意の次元 n の問題に適用でき、線型にしかスケールできない。
論文参考訳（メタデータ） (2023-09-12T14:36:13Z)
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
分散を低減した行列生成のために, WTA-CRS と呼ばれる新しい非バイアス推定系を提案する。我々の研究は、チューニング変換器の文脈において、提案した推定器が既存のものよりも低い分散を示すという理論的および実験的証拠を提供する。
論文参考訳（メタデータ） (2023-05-24T15:52:08Z)
Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization [50.83356836818667]
勾配ランゲヴィン・ダイナミクスは非エプス最適化問題を解くための最も基本的なアルゴリズムの1つである。本稿では、このタイプの2つの変種、すなわち、分散還元ランジュバンダイナミクスと再帰勾配ランジュバンダイナミクスを示す。
論文参考訳（メタデータ） (2022-03-30T11:39:00Z)
Tom: Leveraging trend of the observed gradients for faster convergence [0.0]
TomはAdamの新しい変種であり、ニューラルネットワークによって渡される損失の風景の勾配の傾向を考慮に入れている。 Tomは両方の精度でAdagrad、Adadelta、RMSProp、Adamを上回り、より早く収束する。
論文参考訳（メタデータ） (2021-09-07T20:19:40Z)
High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails [55.561406656549686]
我々は、勾配推定が末尾を持つ可能性のある一階アルゴリズムを用いたヒルベルト非最適化を考える。本研究では, 勾配, 運動量, 正規化勾配勾配の収束を高確率臨界点に収束させることと, 円滑な損失に対する最もよく知られた繰り返しを示す。
論文参考訳（メタデータ） (2021-06-28T00:17:01Z)
Adapting Stepsizes by Momentumized Gradients Improves Optimization and Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。 textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。 textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。
論文参考訳（メタデータ） (2021-06-22T03:13:23Z)
Decreasing scaling transition from adaptive gradient descent to stochastic gradient descent [1.7874193862154875]
本稿では,適応勾配降下法から勾配勾配降下法DSTAdaへのスケーリング遷移を減少させる手法を提案する。実験の結果,DSTAdaは高速で精度が高く,安定性と堅牢性も向上した。
論文参考訳（メタデータ） (2021-06-12T11:28:58Z)
Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering [53.523517926927894]
サンプルごとのHessian-vector積と勾配を用いて、自己チューニングの二次構造を構築する。モデルに基づく手続きが雑音勾配設定に収束することを証明する。これは自己チューニング二次体を構築するための興味深いステップである。
論文参考訳（メタデータ） (2020-11-09T22:07:30Z)
Reparametrizing gradient descent [0.0]
本稿では,ノルム適応勾配勾配という最適化アルゴリズムを提案する。我々のアルゴリズムは準ニュートン法と比較することもできるが、定常点ではなく根を求める。
論文参考訳（メタデータ） (2020-10-09T20:22:29Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。