Fugu-MT 論文翻訳(概要): MGDA Converges under Generalized Smoothness, Provably

論文の概要: MGDA Converges under Generalized Smoothness, Provably

arxiv url: http://arxiv.org/abs/2405.19440v4
Date: Wed, 02 Oct 2024 18:51:01 GMT
ステータス: 翻訳完了
システム内更新日: 2024-12-02 22:03:31.812104
Title: MGDA Converges under Generalized Smoothness, Provably
Title（参考訳）: MGDAが一般化したスムースネスで収束、おそらく
Authors: Qi Zhang, Peiyao Xiao, Shaofeng Zou, Kaiyi Ji,
Abstract要約: 多目的最適化(MOO)はマルチタスク学習など様々な分野で注目を集めている。最近の研究は、理論解析を伴う効果的なアルゴリズムを提供しているが、それらは標準の$L$-smoothあるいは有界勾配仮定によって制限されている。一般化された$ell$-smooth損失関数のより一般的で現実的なクラスについて研究し、$ell$は勾配ノルムの一般非減少関数である。
参考スコア（独自算出の注目度）: 27.87166415148172
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-objective optimization (MOO) is receiving more attention in various fields such as multi-task learning. Recent works provide some effective algorithms with theoretical analysis but they are limited by the standard $L$-smooth or bounded-gradient assumptions, which typically do not hold for neural networks, such as Long short-term memory (LSTM) models and Transformers. In this paper, we study a more general and realistic class of generalized $\ell$-smooth loss functions, where $\ell$ is a general non-decreasing function of gradient norm. We revisit and analyze the fundamental multiple gradient descent algorithm (MGDA) and its stochastic version with double sampling for solving the generalized $\ell$-smooth MOO problems, which approximate the conflict-avoidant (CA) direction that maximizes the minimum improvement among objectives. We provide a comprehensive convergence analysis of these algorithms and show that they converge to an $\epsilon$-accurate Pareto stationary point with a guaranteed $\epsilon$-level average CA distance (i.e., the gap between the updating direction and the CA direction) over all iterations, where totally $\mathcal{O}(\epsilon^{-2})$ and $\mathcal{O}(\epsilon^{-4})$ samples are needed for deterministic and stochastic settings, respectively. We prove that they can also guarantee a tighter $\epsilon$-level CA distance in each iteration using more samples. Moreover, we analyze an efficient variant of MGDA named MGDA-FA using only $\mathcal{O}(1)$ time and space, while achieving the same performance guarantee as MGDA.
Abstract（参考訳）: 多目的最適化(MOO)はマルチタスク学習など様々な分野で注目を集めている。最近の研究は、理論解析で有効なアルゴリズムを提供しているが、Long short-term memory(LSTM)モデルやTransformerのようなニューラルネットワークには適用されない標準の$L$-smoothやbounded-gradientの仮定によって制限されている。本稿では、より一般的で現実的な一般化された$\ell$-smooth損失関数の研究を行い、$\ell$は勾配ノルムの一般非減少関数である。我々は、目的物間の最小改善を最大化する競合回避方向(CA)を近似する一般化された$\ell$-smooth MOO問題の解法として、基本的な多重勾配勾配アルゴリズム(MGDA)とその確率版を二重サンプリングで再検討し、解析する。これらのアルゴリズムの総合収束解析を行い、全ての反復に対して、$\epsilon$-accurate Pareto 定常点(平均 CA 距離)$\epsilon$-level 平均 CA 距離(すなわち、更新方向と CA 方向のギャップ)を保証し、完全な$\mathcal{O}(\epsilon^{-2})$と$\mathcal{O}(\epsilon^{-4})$サンプルをそれぞれ決定論的および確率的設定のために必要であることを示す。より厳密な$\epsilon$レベルのCA距離を各イテレーションで、より多くのサンプルを使って保証することも証明しています。さらに,MGDA と同じ性能保証を達成しつつ,MGDA-FA という MGDA の効率的な変種を $\mathcal{O}(1)$ 時間と空間のみを用いて解析する。

関連論文リスト

Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency [52.60557300927007]
離散部分モジュラー問題を連続的に最適化するために,$textbfMA-OSMA$アルゴリズムを提案する。また、一様分布を混合することによりKLの発散を効果的に活用する、プロジェクションフリーな$textbfMA-OSEA$アルゴリズムも導入する。我々のアルゴリズムは最先端OSGアルゴリズムによって提供される$(frac11+c)$-approximationを大幅に改善する。
論文参考訳（メタデータ） (2025-02-07T15:57:56Z)
An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue Optimization Problems [76.2042837251496]
相補的な合成条件に基づく2つの難解なミラー降下アルゴリズムを導入する。注目すべきは、どちらのアルゴリズムも、目的関数のリプシッツ定数や滑らかさに関する事前の知識なしで機能する。本稿では,大規模半確定プログラム上での手法の効率性とロバスト性を示す。
論文参考訳（メタデータ） (2023-06-30T08:34:29Z)
On Convergence of Incremental Gradient for Non-Convex Smooth Functions [63.51187646914962]
機械学習とネットワーク最適化では、ミスの数と優れたキャッシュを最小化するため、シャッフルSGDのようなアルゴリズムが人気である。本稿では任意のデータ順序付けによる収束特性SGDアルゴリズムについて述べる。
論文参考訳（メタデータ） (2023-05-30T17:47:27Z)
Multi-block-Single-probe Variance Reduced Estimator for Coupled Compositional Optimization [49.58290066287418]
構成問題の複雑さを軽減するために,MSVR (Multi-block-probe Variance Reduced) という新しい手法を提案する。本研究の結果は, 試料の複雑さの順序や強靭性への依存など, 様々な面で先行して改善された。
論文参考訳（メタデータ） (2022-07-18T12:03:26Z)
Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning [77.22019100456595]
通信周波数の異なる分散計算作業者のトレーニングアルゴリズムを示す。本研究では,より厳密な収束率を$mathcalO!!(sigma2-2_avg!)とする。また,不均一性の項は,作業者の平均遅延によっても影響されることを示した。
論文参考訳（メタデータ） (2022-06-16T17:10:57Z)
An Improved Analysis of Gradient Tracking for Decentralized Machine Learning [34.144764431505486]
トレーニングデータが$n$エージェントに分散されるネットワーク上での分散機械学習を検討する。エージェントの共通の目標は、すべての局所損失関数の平均を最小化するモデルを見つけることである。ノイズのない場合、$p$を$mathcalO(p-1)$から$mathcalO(p-1)$に改善します。
論文参考訳（メタデータ） (2022-02-08T12:58:14Z)
DoCoM: Compressed Decentralized Optimization with Near-Optimal Sample Complexity [25.775517797956237]
本稿では,Douubly Compressed Momentum-assisted tracking algorithm $ttDoCoM$ for communicationを提案する。我々のアルゴリズムは、実際にいくつかの最先端のアルゴリズムより優れていることを示す。
論文参考訳（メタデータ） (2022-02-01T07:27:34Z)
On the Last Iterate Convergence of Momentum Methods [32.60494006038902]
我々は、任意の一定の運動量係数に対して、最後の反復が誤差に苦しむリプシッツおよび凸函数が存在することを初めて証明する。凸関数と滑らかな関数の設定では、新しいSGDMアルゴリズムが自動的に$O(fraclog TT)$のレートで収束することを示しています。
論文参考訳（メタデータ） (2021-02-13T21:16:16Z)
Gradient-Based Empirical Risk Minimization using Local Polynomial Regression [39.29885444997579]
この論文の主な目標は、勾配降下(GD)や勾配降下(SGD)といった異なるアルゴリズムを比較することである。損失関数がデータのスムーズな場合、各反復でオラクルを学習し、GDとSGDの両方のオラクル複雑度に打ち勝つことができることを示す。
論文参考訳（メタデータ） (2020-11-04T20:10:31Z)
Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
学習者が生成モデル$Y = langle X,w* rangle + epsilon$から$n$のサンプルにアクセスできるような高次元頑健な線形回帰問題について検討する。 i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance, (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
論文参考訳（メタデータ） (2020-07-16T06:44:44Z)
Stochastic Proximal Gradient Algorithm with Minibatches. Application to Large Scale Learning Models [2.384873896423002]
非滑らかな成分を持つ汎用合成対象関数に対する勾配アルゴリズムのミニバッチ変種を開発し解析する。我々は、最小バッチサイズ$N$に対して、$mathcalO(frac1Nepsilon)$$epsilon-$subityが最適解に期待される二次距離で達成されるような、定数および変数のステップサイズ反復ポリシーの複雑さを提供する。
論文参考訳（メタデータ） (2020-03-30T10:43:56Z)
Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling [56.394284787780364]
本稿では、ポリシー勾配(PG)と時間差(TD)学習の2つの基本RLアルゴリズムに対して、最初の理論的収束解析を行う。一般の非線形関数近似の下では、PG-AMSGradは定常点の近傍に収束し、$mathcalO(log T/sqrtT)$である。線形関数近似の下では、一定段階のTD-AMSGradは$mathcalO(log T/sqrtT)の速度で大域的最適化の近傍に収束する。
論文参考訳（メタデータ） (2020-02-15T00:26:49Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。