Fugu-MT 論文翻訳(概要): On the Convergence of Multi-objective Optimization under Generalized Smoothness

論文の概要: On the Convergence of Multi-objective Optimization under Generalized Smoothness

arxiv url: http://arxiv.org/abs/2405.19440v2
Date: Wed, 12 Jun 2024 18:34:36 GMT
ステータス: 翻訳完了
システム内更新日: 2024-06-14 22:56:33.187645
Title: On the Convergence of Multi-objective Optimization under Generalized Smoothness
Title（参考訳）: 一般化滑らか性下における多目的最適化の収束性について
Authors: Qi Zhang, Peiyao Xiao, Kaiyi Ji, Shaofeng Zou,
Abstract要約: 我々はより一般的で現実的な$ell$-smooth損失関数のクラスを研究し、$ell$は一般の非減少関数ノルムである。我々は、$ell$-smooth Generalized Multi-MOO GradientGradと、その変種である Generalized Smooth Multi-MOO descentの2つの新しいアルゴリズムを開発した。私たちのアルゴリズムは、より厳密な$mathcalO(epsilon-2)$を各イテレーションで、より多くのサンプルを使って保証します。
参考スコア（独自算出の注目度）: 27.87166415148172
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-objective optimization (MOO) is receiving more attention in various fields such as multi-task learning. Recent works provide some effective algorithms with theoretical analysis but they are limited by the standard $L$-smooth or bounded-gradient assumptions, which are typically unsatisfactory for neural networks, such as recurrent neural networks (RNNs) and transformers. In this paper, we study a more general and realistic class of $\ell$-smooth loss functions, where $\ell$ is a general non-decreasing function of gradient norm. We develop two novel single-loop algorithms for $\ell$-smooth MOO problems, Generalized Smooth Multi-objective Gradient descent (GSMGrad) and its stochastic variant, Stochastic Generalized Smooth Multi-objective Gradient descent (SGSMGrad), which approximate the conflict-avoidant (CA) direction that maximizes the minimum improvement among objectives. We provide a comprehensive convergence analysis of both algorithms and show that they converge to an $\epsilon$-accurate Pareto stationary point with a guaranteed $\epsilon$-level average CA distance (i.e., the gap between the updating direction and the CA direction) over all iterations, where totally $\mathcal{O}(\epsilon^{-2})$ and $\mathcal{O}(\epsilon^{-4})$ samples are needed for deterministic and stochastic settings, respectively. Our algorithms can also guarantee a tighter $\epsilon$-level CA distance in each iteration using more samples. Moreover, we propose a practical variant of GSMGrad named GSMGrad-FA using only constant-level time and space, while achieving the same performance guarantee as GSMGrad. Our experiments validate our theory and demonstrate the effectiveness of the proposed methods.
Abstract（参考訳）: 多目的最適化(MOO)はマルチタスク学習など様々な分野で注目を集めている。最近の研究は、理論的な分析を伴う効果的なアルゴリズムを提供しているが、それらは標準の$L$-smoothや、リカレントニューラルネットワーク(RNN)やトランスフォーマーのようなニューラルネットワークには不満足な境界段階の仮定によって制限されている。本稿では、より一般的で現実的な$\ell$-smooth損失関数の研究を行い、$\ell$は勾配ノルムの一般非減少関数である。目的物間の最小改善を最大化する競合回避(CA)方向を近似した,$\ell$-smooth MOO問題,一般化されたSmooth Multi-objective Gradient descent (GSMGrad) とその確率的変種であるStochastic Generalized Smooth Multi-objective Gradient descent (SGSMGrad) の2つの新しいシングルループアルゴリズムを開発した。両アルゴリズムの総合収束解析を行い, 平均CA距離を保証した$\epsilon$-accurate Pareto定常点(すなわち, 更新方向とCA方向のギャップ)に全反復で収束することを示し, 完全$\mathcal{O}(\epsilon^{-2})$と$\mathcal{O}(\epsilon^{-4})$サンプルは決定論的および確率的設定にそれぞれ必要である。私たちのアルゴリズムは、より多くのサンプルを使用して、各イテレーションにおいてより厳密な$\epsilon$-level CA距離を保証することができます。また,GSMGradと同等の性能保証を達成しつつ,一定の時間と空間のみを用いてGSMGrad-FAという実用的なGSMGradの変種を提案する。提案手法の有効性を検証し,提案手法の有効性を検証した。

関連論文リスト

Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency [52.60557300927007]
離散部分モジュラー問題を連続的に最適化するために,$textbfMA-OSMA$アルゴリズムを提案する。また、一様分布を混合することによりKLの発散を効果的に活用する、プロジェクションフリーな$textbfMA-OSEA$アルゴリズムも導入する。我々のアルゴリズムは最先端OSGアルゴリズムによって提供される$(frac11+c)$-approximationを大幅に改善する。
論文参考訳（メタデータ） (2025-02-07T15:57:56Z)
Adam-like Algorithm with Smooth Clipping Attains Global Minima: Analysis Based on Ergodicity of Functional SDEs [0.0]
我々は,グローバル化された非-1損失関数を切断したAdam型アルゴリズムが正規化された非-1エラー形式を最小化することを示す。また、スムーズな群のエルゴード理論を適用して、逆温度と時間を学ぶためのアプローチを研究する。
論文参考訳（メタデータ） (2023-11-29T14:38:59Z)
An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue Optimization Problems [76.2042837251496]
相補的な合成条件に基づく2つの難解なミラー降下アルゴリズムを導入する。注目すべきは、どちらのアルゴリズムも、目的関数のリプシッツ定数や滑らかさに関する事前の知識なしで機能する。本稿では,大規模半確定プログラム上での手法の効率性とロバスト性を示す。
論文参考訳（メタデータ） (2023-06-30T08:34:29Z)
On Convergence of Incremental Gradient for Non-Convex Smooth Functions [63.51187646914962]
機械学習とネットワーク最適化では、ミスの数と優れたキャッシュを最小化するため、シャッフルSGDのようなアルゴリズムが人気である。本稿では任意のデータ順序付けによる収束特性SGDアルゴリズムについて述べる。
論文参考訳（メタデータ） (2023-05-30T17:47:27Z)
Multi-block-Single-probe Variance Reduced Estimator for Coupled Compositional Optimization [49.58290066287418]
構成問題の複雑さを軽減するために,MSVR (Multi-block-probe Variance Reduced) という新しい手法を提案する。本研究の結果は, 試料の複雑さの順序や強靭性への依存など, 様々な面で先行して改善された。
論文参考訳（メタデータ） (2022-07-18T12:03:26Z)
Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning [77.22019100456595]
通信周波数の異なる分散計算作業者のトレーニングアルゴリズムを示す。本研究では,より厳密な収束率を$mathcalO!!(sigma2-2_avg!)とする。また,不均一性の項は,作業者の平均遅延によっても影響されることを示した。
論文参考訳（メタデータ） (2022-06-16T17:10:57Z)
An Improved Analysis of Gradient Tracking for Decentralized Machine Learning [34.144764431505486]
トレーニングデータが$n$エージェントに分散されるネットワーク上での分散機械学習を検討する。エージェントの共通の目標は、すべての局所損失関数の平均を最小化するモデルを見つけることである。ノイズのない場合、$p$を$mathcalO(p-1)$から$mathcalO(p-1)$に改善します。
論文参考訳（メタデータ） (2022-02-08T12:58:14Z)
DoCoM: Compressed Decentralized Optimization with Near-Optimal Sample Complexity [25.775517797956237]
本稿では,Douubly Compressed Momentum-assisted tracking algorithm $ttDoCoM$ for communicationを提案する。我々のアルゴリズムは、実際にいくつかの最先端のアルゴリズムより優れていることを示す。
論文参考訳（メタデータ） (2022-02-01T07:27:34Z)
On the Last Iterate Convergence of Momentum Methods [32.60494006038902]
我々は、任意の一定の運動量係数に対して、最後の反復が誤差に苦しむリプシッツおよび凸函数が存在することを初めて証明する。凸関数と滑らかな関数の設定では、新しいSGDMアルゴリズムが自動的に$O(fraclog TT)$のレートで収束することを示しています。
論文参考訳（メタデータ） (2021-02-13T21:16:16Z)
Gradient-Based Empirical Risk Minimization using Local Polynomial Regression [39.29885444997579]
この論文の主な目標は、勾配降下(GD)や勾配降下(SGD)といった異なるアルゴリズムを比較することである。損失関数がデータのスムーズな場合、各反復でオラクルを学習し、GDとSGDの両方のオラクル複雑度に打ち勝つことができることを示す。
論文参考訳（メタデータ） (2020-11-04T20:10:31Z)
Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
学習者が生成モデル$Y = langle X,w* rangle + epsilon$から$n$のサンプルにアクセスできるような高次元頑健な線形回帰問題について検討する。 i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance, (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
論文参考訳（メタデータ） (2020-07-16T06:44:44Z)
Stochastic Proximal Gradient Algorithm with Minibatches. Application to Large Scale Learning Models [2.384873896423002]
非滑らかな成分を持つ汎用合成対象関数に対する勾配アルゴリズムのミニバッチ変種を開発し解析する。我々は、最小バッチサイズ$N$に対して、$mathcalO(frac1Nepsilon)$$epsilon-$subityが最適解に期待される二次距離で達成されるような、定数および変数のステップサイズ反復ポリシーの複雑さを提供する。
論文参考訳（メタデータ） (2020-03-30T10:43:56Z)
Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling [56.394284787780364]
本稿では、ポリシー勾配(PG)と時間差(TD)学習の2つの基本RLアルゴリズムに対して、最初の理論的収束解析を行う。一般の非線形関数近似の下では、PG-AMSGradは定常点の近傍に収束し、$mathcalO(log T/sqrtT)$である。線形関数近似の下では、一定段階のTD-AMSGradは$mathcalO(log T/sqrtT)の速度で大域的最適化の近傍に収束する。
論文参考訳（メタデータ） (2020-02-15T00:26:49Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。