Fugu-MT 論文翻訳(概要): Gradient Descent-Ascent Provably Converges to Strict Local Minmax Equilibria with a Finite Timescale Separation

論文の概要: Gradient Descent-Ascent Provably Converges to Strict Local Minmax Equilibria with a Finite Timescale Separation

arxiv url: http://arxiv.org/abs/2009.14820v1
Date: Wed, 30 Sep 2020 17:51:28 GMT
ステータス: 翻訳完了
システム内更新日: 2022-10-12 23:28:26.187910
Title: Gradient Descent-Ascent Provably Converges to Strict Local Minmax Equilibria with a Finite Timescale Separation
Title（参考訳）: 勾配降下・上昇は有限時間スケール分離を伴う厳密な局所minmax平衡に収束する
Authors: Tanner Fiez, Lillian Ratliff
Abstract要約: 有限時間スケールの分離パラメータ $tau$ は、非プレイヤ、非コンケーブゼロサムゲームにおいて勾配降下度に比例することを示す。タイムスケールコンピューティングがパフォーマンスに与える影響を実証的に示す。
参考スコア（独自算出の注目度）: 11.091975655053547
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the role that a finite timescale separation parameter $\tau$ has on gradient descent-ascent in two-player non-convex, non-concave zero-sum games where the learning rate of player 1 is denoted by $\gamma_1$ and the learning rate of player 2 is defined to be $\gamma_2=\tau\gamma_1$. Existing work analyzing the role of timescale separation in gradient descent-ascent has primarily focused on the edge cases of players sharing a learning rate ($\tau =1$) and the maximizing player approximately converging between each update of the minimizing player ($\tau \rightarrow \infty$). For the parameter choice of $\tau=1$, it is known that the learning dynamics are not guaranteed to converge to a game-theoretically meaningful equilibria in general. In contrast, Jin et al. (2020) showed that the stable critical points of gradient descent-ascent coincide with the set of strict local minmax equilibria as $\tau\rightarrow\infty$. In this work, we bridge the gap between past work by showing there exists a finite timescale separation parameter $\tau^{\ast}$ such that $x^{\ast}$ is a stable critical point of gradient descent-ascent for all $\tau \in (\tau^{\ast}, \infty)$ if and only if it is a strict local minmax equilibrium. Moreover, we provide an explicit construction for computing $\tau^{\ast}$ along with corresponding convergence rates and results under deterministic and stochastic gradient feedback. The convergence results we present are complemented by a non-convergence result: given a critical point $x^{\ast}$ that is not a strict local minmax equilibrium, then there exists a finite timescale separation $\tau_0$ such that $x^{\ast}$ is unstable for all $\tau\in (\tau_0, \infty)$. Finally, we empirically demonstrate on the CIFAR-10 and CelebA datasets the significant impact timescale separation has on training performance.
Abstract（参考訳）: プレイヤー1の学習率を$\gamma_1$とし、プレイヤー2の学習率を$\gamma_2=\tau\gamma_1$とする2段非凸非凹ゼロサムゲームにおいて、有限時間スケール分離パラメータ$\tau$が勾配勾配上昇に与える影響を考察する。勾配降下における時間スケール分離の役割を分析する既存の研究は、主に学習率(\tau =1$)を共有するプレイヤーのエッジケースと、最小化プレイヤーの更新(\tau \rightarrow \infty$)間でほぼ収束する最大化プレイヤーに焦点を当てている。パラメータ選択が$\tau=1$の場合、学習力学は一般にゲーム理論上意味のある平衡に収束することが保証されていないことが知られている。対照的に、jin et al. (2020) は勾配降下の安定な臨界点と厳密な局所minmax平衡のセットが$\tau\rightarrow\infty$と一致することを示した。この研究において、x^{\ast}$ がすべての$\tau \in (\tau^{\ast}, \infty)$ に対して安定な勾配降下の臨界点となるように、有限の時間スケール分離パラメータ $\tau^{\ast}$ が存在すること、そしてそれが厳密な局所minmax平衡であることと場合に限り、過去の仕事の間のギャップを橋渡しする。さらに、決定論的および確率的勾配フィードバックの下で、対応する収束率と結果と共に$\tau^{\ast}$を計算するための明示的な構成を提供する。厳密な局所ミンマックス平衡でない臨界点 $x^{\ast}$ が与えられたとき、すべての$\tau\in (\tau_0, \infty)$ に対して$x^{\ast}$ が不安定となるような有限時間スケール分離 $\tau_0$ が存在する。最後に,CIFAR-10とCelebAデータセットを用いて,トレーニング性能に対する時間スケール分離の影響を実証した。

関連論文リスト

Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit Feedback and Unknown Transition [71.33787410075577]
線形関数近似,未知遷移,および逆損失を用いた強化学習について検討した。我々は高い確率で$widetildeO(dsqrtHS3K + sqrtHSAK)$ regretを実現する新しいアルゴリズムを提案する。
論文参考訳（メタデータ） (2024-03-07T15:03:50Z)
Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits [4.811176167998627]
我々は、未知の分布から生じる無限に多くのバンドイットアームを用いて純粋探索を研究する。私たちのゴールは、平均的な報酬が1-delta$の1つの高品質なアームを、最高の$eta$-fraction of armsの1つとして$varepsilon$内で効率的に選択することにあります。
論文参考訳（メタデータ） (2023-06-03T04:00:47Z)
Local Convergence of Gradient Methods for Min-Max Games: Partial Curvature Generically Suffices [0.5439020425819]
2つのプレイヤーゼロサム微分可能なゲームに対する勾配法の局所的ナッシュ平衡の収束について検討する。偏曲のおかげで、円錐粒子法 -- 重みを最適化し、混合戦略をサポートする -- は、固定支持法よりも一般的に収束する。
論文参考訳（メタデータ） (2023-05-26T21:43:56Z)
Depth Dependence of $\mu$P Learning Rates in ReLU MLPs [72.14317069090407]
我々は、最大更新(mu$P)学習率の$n$と$L$に依存することを研究する。我々は、$L3/2.$のように、$L$の非自明な依存があることを発見した。
論文参考訳（メタデータ） (2023-05-13T01:10:49Z)
Beyond Uniform Smoothness: A Stopped Analysis of Adaptive SGD [38.221784575853796]
この研究は、勾配を用いて潜在的に一定の滑らかさを持つ非アトー関数の1次定常点を求める問題を考える。我々は、ノイズに一様境界を仮定することなく$mathcalO(fracmathrmpolylog(T)sigmatT)$収束率を証明できる技術を開発した。
論文参考訳（メタデータ） (2023-02-13T18:13:36Z)
Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning [77.22019100456595]
通信周波数の異なる分散計算作業者のトレーニングアルゴリズムを示す。本研究では,より厳密な収束率を$mathcalO!!(sigma2-2_avg!)とする。また,不均一性の項は,作業者の平均遅延によっても影響されることを示した。
論文参考訳（メタデータ） (2022-06-16T17:10:57Z)
Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games [18.832436856339587]
本稿では,オフラインデータから2プレイヤーゼロサムマルコフゲームにおけるナッシュ均衡の学習に向けて前進する。ベルンシュタイン型低信頼境界を持つ悲観的モデルベースアルゴリズム(VI-LCB-Game)を提案する。
論文参考訳（メタデータ） (2022-06-08T17:58:06Z)
Preconditioned Gradient Descent for Overparameterized Nonconvex Burer--Monteiro Factorization with Global Optimality Certification [14.674494335647841]
非函数 $f(X)=phi(XXT)$ を$ntimes r$ factor matrix $X$ で最小化するために勾配降下を考える。本稿では,勾配降下の収束率を線形に戻すための安価なプレコンディショナーを提案する。
論文参考訳（メタデータ） (2022-06-07T14:26:49Z)
Nearly Horizon-Free Offline Reinforcement Learning [97.36751930393245]
S$状態、$A$アクション、計画的地平$H$で、エピソードな時間同質なMarkov決定プロセスに関するオフライン強化学習を再考する。経験的MDPを用いた評価と計画のための,約$H$自由なサンプル複雑性境界の最初の集合を得る。
論文参考訳（メタデータ） (2021-03-25T18:52:17Z)
Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes [91.38793800392108]
本稿では,マルコフ決定過程(MDP)の遷移確率核が線形混合モデルである線形関数近似による強化学習について検討する。上記の線形混合 MDP に対して$textUCRL-VTR+$ という線形関数近似を用いた計算効率の良い新しいアルゴリズムを提案する。我々の知る限り、これらは線形関数近似を持つRLのための計算効率が良く、ほぼ最小のアルゴリズムである。
論文参考訳（メタデータ） (2020-12-15T18:56:46Z)
Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness [151.67113334248464]
我々は、他の攻撃モデルに対してスムースな手法を拡張することは困難であることを示す。我々はCIFARに関する実験結果を示し,その理論を検証した。
論文参考訳（メタデータ） (2020-02-08T22:02:14Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。