Fugu-MT 論文翻訳(概要): Gradient-Based Empirical Risk Minimization using Local Polynomial Regression

論文の概要: Gradient-Based Empirical Risk Minimization using Local Polynomial Regression

arxiv url: http://arxiv.org/abs/2011.02522v1
Date: Wed, 4 Nov 2020 20:10:31 GMT
ステータス: 翻訳完了
システム内更新日: 2022-09-29 22:06:08.257736
Title: Gradient-Based Empirical Risk Minimization using Local Polynomial Regression
Title（参考訳）: 局所多項式回帰を用いた勾配に基づく経験的リスク最小化
Authors: Ali Jadbabaie and Anuran Makur and Devavrat Shah
Abstract要約: この論文の主な目標は、勾配降下(GD)や勾配降下(SGD)といった異なるアルゴリズムを比較することである。損失関数がデータのスムーズな場合、各反復でオラクルを学習し、GDとSGDの両方のオラクル複雑度に打ち勝つことができることを示す。
参考スコア（独自算出の注目度）: 39.29885444997579
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we consider the problem of empirical risk minimization (ERM) of smooth, strongly convex loss functions using iterative gradient-based methods. A major goal of this literature has been to compare different algorithms, such as gradient descent (GD) or stochastic gradient descent (SGD), by analyzing their rates of convergence to $\epsilon$-approximate solutions. For example, the oracle complexity of GD is $O(n\log(\epsilon^{-1}))$, where $n$ is the number of training samples. When $n$ is large, this can be expensive in practice, and SGD is preferred due to its oracle complexity of $O(\epsilon^{-1})$. Such standard analyses only utilize the smoothness of the loss function in the parameter being optimized. In contrast, we demonstrate that when the loss function is smooth in the data, we can learn the oracle at every iteration and beat the oracle complexities of both GD and SGD in important regimes. Specifically, at every iteration, our proposed algorithm performs local polynomial regression to learn the gradient of the loss function, and then estimates the true gradient of the ERM objective function. We establish that the oracle complexity of our algorithm scales like $\tilde{O}((p \epsilon^{-1})^{d/(2\eta)})$ (neglecting sub-dominant factors), where $d$ and $p$ are the data and parameter space dimensions, respectively, and the gradient of the loss function belongs to a $\eta$-H\"{o}lder class with respect to the data. Our proof extends the analysis of local polynomial regression in non-parametric statistics to provide interpolation guarantees in multivariate settings, and also exploits tools from the inexact GD literature. Unlike GD and SGD, the complexity of our method depends on $d$ and $p$. However, when $d$ is small and the loss function exhibits modest smoothness in the data, our algorithm beats GD and SGD in oracle complexity for a very broad range of $p$ and $\epsilon$.
Abstract（参考訳）: 本稿では,反復勾配に基づく手法を用いて,滑らかで強い凸損失関数の経験的リスク最小化(erm)の問題を考える。この文献の主要な目標は、収束率を$\epsilon$-approximate 解に分析することによって、勾配降下 (gd) や確率勾配降下 (sgd) といった異なるアルゴリズムを比較することである。例えば、oracleのgdの複雑さは$o(n\log(\epsilon^{-1})$であり、ここでは$n$はトレーニングサンプルの数である。 n$ が大きければ、これは実際に高価になり、sgd が好まれるのは、oracle が $o(\epsilon^{-1})$ という複雑さのためである。このような標準解析は最適化されているパラメータの損失関数の滑らかさのみを利用する。対照的に、データの損失関数がスムーズな場合、各イテレーションでoracleを学習し、重要なシステムにおいてgdとsgdの両方のoracleの複雑さを上回ることを示しています。具体的には、各繰り返しにおいて、提案アルゴリズムは損失関数の勾配を学習するために局所多項式回帰を行い、ERM対象関数の真の勾配を推定する。このアルゴリズムのoracleの複雑さは、$\tilde{o}((p \epsilon^{-1})^{d/(2\eta)})$ (neglecting sub-dominant factors)のようにスケールする。ここで、$d$と$p$はそれぞれデータとパラメータ空間の次元であり、損失関数の勾配はデータに関して$\eta$-h\"{o}lderクラスに属する。我々は,非パラメトリック統計学における局所多項式回帰解析を拡張し,多変量設定における補間保証を提供するとともに,不正確なGD文献からツールを利用する。 GDやSGDとは異なり、我々の手法の複雑さは$d$と$p$に依存する。しかし、$d$が小さく、損失関数がデータの滑らかさを示すと、我々のアルゴリズムはオラクルの複雑さにおいて、非常に広い範囲の$p$と$\epsilon$でgdとsgdを上回ります。

関連論文リスト

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
単一インデックス対象関数 $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ の勾配勾配勾配学習問題について検討する。 SGDに基づくアルゴリズムにより最適化された2層ニューラルネットワークは、情報指数に支配されない複雑さで$f_*$を学習する。
論文参考訳（メタデータ） (2024-06-03T17:56:58Z)
MGDA Converges under Generalized Smoothness, Provably [27.87166415148172]
多目的最適化(MOO)はマルチタスク学習など様々な分野で注目を集めている。最近の研究は、理論解析を伴う効果的なアルゴリズムを提供しているが、それらは標準の$L$-smoothあるいは有界勾配仮定によって制限されている。一般化された$ell$-smooth損失関数のより一般的で現実的なクラスについて研究し、$ell$は勾配ノルムの一般非減少関数である。
論文参考訳（メタデータ） (2024-05-29T18:36:59Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
正方形損失に関して、標準的なガウス分布の下での$k$ReLU活性化の線形結合をPAC学習する問題をmathbbRd$で検討する。本研究の主な成果は,この学習課題に対して,サンプルおよび計算複雑性が$(dk/epsilon)O(k)$で,epsilon>0$が目標精度である。
論文参考訳（メタデータ） (2023-07-24T14:37:22Z)
ReSQueing Parallel and Private Stochastic Convex Optimization [59.53297063174519]
本稿では,BFG凸最適化(SCO: Reweighted Query (ReSQue) 推定ツールを提案する。我々はSCOの並列およびプライベート設定における最先端の複雑さを実現するアルゴリズムを開発した。
論文参考訳（メタデータ） (2023-01-01T18:51:29Z)
Gradient Descent for Low-Rank Functions [36.56489593549855]
例えば、深層ニューラルネットワークのトレーニングのような機械学習タスクでは、損失関数は入力のわずか数方向に大きく変化する。提案した emphLowRank Descent は $mathcalO(plog(1/epsilon))$gd と $mathcalOp/epsilon2)$p/epsilon2)$を識別して $epsilon 勾配関数を求める。
論文参考訳（メタデータ） (2022-06-16T15:58:05Z)
Optimal Gradient Sliding and its Application to Distributed Optimization Under Similarity [121.83085611327654]
積 $r:=p + q$, ここで$r$は$mu$-strong convex類似性である。エージェントの通信やローカルコールにマスターされた問題を解決する方法を提案する。提案手法は$mathcalO(sqrtL_q/mu)$法よりもはるかにシャープである。
論文参考訳（メタデータ） (2022-05-30T14:28:02Z)
On Acceleration of Gradient-Based Empirical Risk Minimization using Local Polynomial Regression [0.491574468325115]
最近提案された局所多項式補間法(LPIGD)による近似解経験的リスク問題(ERM)の高速化について検討した。我々は条件数$sigma$と強く凸で滑らかな損失関数にフォーカスする。 LPI-GDに基づく問題を高速化する2つの手法を提案し,その複雑さを$tildeOleft(sqrtsigma md log (1/varepsilon)$とする。
論文参考訳（メタデータ） (2022-04-16T02:39:45Z)
Federated Optimization of Smooth Loss Functions [35.944772011421264]
本研究では,連合学習フレームワークにおける経験的リスク最小化(ERM)について検討する。本稿では,FedLRGDアルゴリズムを提案する。提案手法は,不正確な勾配勾配勾配を用いてサーバのERM問題を解く。
論文参考訳（メタデータ） (2022-01-06T07:40:51Z)
Hybrid Stochastic-Deterministic Minibatch Proximal Gradient: Less-Than-Single-Pass Optimization with Nearly Optimal Generalization [83.80460802169999]
HSDMPGは、学習モデル上で過大なエラーの順序である$mathcalObig(1/sttnbig)$を達成可能であることを示す。損失係数について、HSDMPGは学習モデル上で過大なエラーの順序である$mathcalObig(1/sttnbig)$を達成できることを示す。
論文参考訳（メタデータ） (2020-09-18T02:18:44Z)
Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
学習者が生成モデル$Y = langle X,w* rangle + epsilon$から$n$のサンプルにアクセスできるような高次元頑健な線形回帰問題について検討する。 i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance, (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
論文参考訳（メタデータ） (2020-07-16T06:44:44Z)
Robust estimation via generalized quasi-gradients [28.292300073453877]
最近提案されたロバスト推定問題の多くが効率的に解ける理由を示す。我々は「一般化された準次数」の存在を識別する一般化された準勾配が存在することを示し、効率的なアルゴリズムを構築する。
論文参考訳（メタデータ） (2020-05-28T15:14:33Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。