Fugu-MT 論文翻訳(概要): A Nearly Optimal Single Loop Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness

論文の概要: A Nearly Optimal Single Loop Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness

arxiv url: http://arxiv.org/abs/2412.20017v1
Date: Sat, 28 Dec 2024 04:40:27 GMT
ステータス: 翻訳完了
システム内更新日: 2024-12-31 22:07:02.441552
Title: A Nearly Optimal Single Loop Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness
Title（参考訳）: 非有界な平滑性の下での確率的二値最適化のためのほぼ最適単一ループアルゴリズム
Authors: Xiaochuan Gong, Jie Hao, Mingrui Liu,
Abstract要約: 本稿では、上層関数が非定常で、潜在的に非有界な滑らかさを持ち、下層関数が凸であるような二層最適化の問題を考察する。既存のアルゴリズムはネストループに依存しており、これは重要なチューニング作業を必要とし、実用的ではない。
参考スコア（独自算出の注目度）: 15.656614304616006
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies the problem of stochastic bilevel optimization where the upper-level function is nonconvex with potentially unbounded smoothness and the lower-level function is strongly convex. This problem is motivated by meta-learning applied to sequential data, such as text classification using recurrent neural networks, where the smoothness constant of the upper-level loss function scales linearly with the gradient norm and can be potentially unbounded. Existing algorithm crucially relies on the nested loop design, which requires significant tuning efforts and is not practical. In this paper, we address this issue by proposing a Single Loop bIlevel oPtimizer (SLIP). The proposed algorithm first updates the lower-level variable by a few steps of stochastic gradient descent, and then simultaneously updates the upper-level variable by normalized stochastic gradient descent with momentum and the lower-level variable by stochastic gradient descent. Under standard assumptions, we show that our algorithm finds an $\epsilon$-stationary point within $\widetilde{O}(1/\epsilon^4)$\footnote{Here $\widetilde{O}(\cdot)$ compresses logarithmic factors of $1/\epsilon$ and $1/\delta$, where $\delta\in(0,1)$ denotes the failure probability.} oracle calls of stochastic gradient or Hessian-vector product, both in expectation and with high probability. This complexity result is nearly optimal up to logarithmic factors without mean-square smoothness of the stochastic gradient oracle. Our proof relies on (i) a refined characterization and control of the lower-level variable and (ii) establishing a novel connection between bilevel optimization and stochastic optimization under distributional drift. Our experiments on various tasks show that our algorithm significantly outperforms strong baselines in bilevel optimization.
Abstract（参考訳）: 本稿では,上層関数が非凸であり,潜在的に非有界な滑らかさを持ち,下層関数が強凸である確率的二レベル最適化の問題について検討する。この問題は、高次損失関数の滑らかさ定数が勾配ノルムと線形にスケールし、潜在的に非有界である可能性のある、反復ニューラルネットワークを用いたテキスト分類などのシーケンシャルデータに適用されたメタラーニングによって動機付けられる。既存のアルゴリズムは、かなりのチューニング作業を必要とするネストループ設計に依存しており、実用的ではない。本稿では,Single Loop bilevel oPtimizer (SLIP)を提案する。提案アルゴリズムは,まず数段階の確率勾配勾配により下層変数を更新し,同時に運動量による正規化確率勾配勾配と下層変数を確率勾配勾配により更新する。標準的な仮定では、我々のアルゴリズムは$\widetilde{O}(1/\epsilon^4)$\footnote{Here $\widetilde{O}(\cdot)$で$\epsilon$と$/\delta$の対数係数を圧縮し、$\delta\in(0,1)$は失敗確率を表す。オラクルは確率勾配やヘッセンベクトル積を期待と高い確率で呼び出す。この複雑さの結果は、確率勾配オラクルの平均2乗滑らかさを伴わない対数的因子にほぼ最適である。私たちの証明は一下位変数の精巧な特徴付け及び制御 (2)分布ドリフト下での双レベル最適化と確率最適化の新たな接続を確立する。種々のタスクに対する実験により,アルゴリズムは両レベル最適化において強いベースラインを著しく上回ることを示した。

論文の概要: A Nearly Optimal Single Loop Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness

関連論文リスト