Fugu-MT 論文翻訳(概要): Understanding Progressive Training Through the Framework of Randomized Coordinate Descent

論文の概要: Understanding Progressive Training Through the Framework of Randomized Coordinate Descent

arxiv url: http://arxiv.org/abs/2306.03626v1
Date: Tue, 6 Jun 2023 12:27:54 GMT
ステータス: 翻訳完了
システム内更新日: 2023-06-07 15:35:23.172992
Title: Understanding Progressive Training Through the Framework of Randomized Coordinate Descent
Title（参考訳）: ランダム座標降下の枠組みによるプログレッシブトレーニングの理解
Authors: Rafa{\l} Szlendak, Elnur Gasanov, Peter Richt\'arik
Abstract要約: 我々は、よく知られたプログレッシブトレーニング手法(PT)のプロキシであるランダム化プログレッシブトレーニングアルゴリズム(RPT)を提案する。 RPT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is first PT is
参考スコア（独自算出の注目度）: 1.6758573326215689
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a Randomized Progressive Training algorithm (RPT) -- a stochastic proxy for the well-known Progressive Training method (PT) (Karras et al., 2017). Originally designed to train GANs (Goodfellow et al., 2014), PT was proposed as a heuristic, with no convergence analysis even for the simplest objective functions. On the contrary, to the best of our knowledge, RPT is the first PT-type algorithm with rigorous and sound theoretical guarantees for general smooth objective functions. We cast our method into the established framework of Randomized Coordinate Descent (RCD) (Nesterov, 2012; Richt\'arik & Tak\'a\v{c}, 2014), for which (as a by-product of our investigations) we also propose a novel, simple and general convergence analysis encapsulating strongly-convex, convex and nonconvex objectives. We then use this framework to establish a convergence theory for RPT. Finally, we validate the effectiveness of our method through extensive computational experiments.
Abstract（参考訳）: 我々は、よく知られたプログレッシブトレーニング法(PT)の確率的プロキシであるランダム化プログレッシブトレーニングアルゴリズム(RPT)を提案する(Karras et al., 2017)。当初、GANを訓練するために設計された(Goodfellow et al., 2014)PTは、最も単純な目的関数に対しても収束解析を行わず、ヒューリスティックとして提案された。それとは対照的に、我々の知る限り、RTPは一般の滑らかな目的関数に対する厳密で健全な理論的保証を持つ最初のPT型アルゴリズムである。我々は,Randomized Coordinate Descent (RCD) (Nesterov, 2012; Richt\'arik & Tak\'a\v{c}, 2014) の確立された枠組みに本手法を投入した。次に、この枠組みを用いてRTTの収束理論を確立する。最後に,提案手法の有効性を計算実験により検証した。

関連論文リスト

Sample and Computationally Efficient Continuous-Time Reinforcement Learning with General Function Approximation [28.63391989014238]
連続時間強化学習(CTRL)は、相互作用が時間とともに継続的に進化する環境において、シーケンシャルな意思決定のための原則的なフレームワークを提供する。サンプルと計算効率の両方を実現するモデルベースアルゴリズムを提案する。我々は,$N$の測定値を用いて,$tildeO(sqrtd_mathcalR + d_mathcalFN-1/2)$の準最適解を求めることができることを示す。
論文参考訳（メタデータ） (2025-05-20T18:37:51Z)
Policy Gradient for LQR with Domain Randomization [25.387541996071093]
ドメインランダム化(DR)は、シミュレーション環境の分布に基づいて、コントローラをトレーニングすることで、sim-to-real転送を可能にする。ドメインランダム化線形二次規則(LQR)のためのポリシー勾配法(PG法)の第1収束解析を提供する。我々は,サンプル平均値と集団レベルの目標値の差を小さくすることに伴う試料複雑度を定量化する。
論文参考訳（メタデータ） (2025-03-31T17:51:00Z)
Understanding Inverse Reinforcement Learning under Overparameterization: Non-Asymptotic Analysis and Global Optimality [52.906438147288256]
我々のアルゴリズムは,特定のニューラルネットワーク構造の下で,最適報酬とポリシーを識別できることが示される。これは、国際的最適性を確実に達成する非漸近収束保証を持つ最初のIRLアルゴリズムである。
論文参考訳（メタデータ） (2025-03-22T21:16:08Z)
Kernel-Based Function Approximation for Average Reward Reinforcement Learning: An Optimist No-Regret Algorithm [11.024396385514864]
無限水平平均報酬設定における近似RLのカーネル関数について考察する。本稿では,バンディットの特別な場合において,取得関数に基づくアルゴリズムと類似した楽観的なアルゴリズムを提案する。
論文参考訳（メタデータ） (2024-10-30T23:04:10Z)
Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
政策勾配法(PG法)は連続強化学習(RL法)問題に対処する手法として成功している。一般的には、収束(ハイパー)政治は、決定論的バージョンをデプロイするためにのみ学習される。本稿では,サンプルの複雑性とデプロイされた決定論的ポリシのパフォーマンスのトレードオフを最適化するために,学習に使用する探索レベルの調整方法を示す。
論文参考訳（メタデータ） (2024-05-03T16:45:15Z)
Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior [51.60683890503293]
生成モデルを用いた複雑な専門家による実演の行動クローニングに関する理論的枠組みを提案する。任意の専門的軌跡の時間ごとのステップ分布に一致するトラジェクトリを生成することができることを示す。
論文参考訳（メタデータ） (2023-07-27T04:27:26Z)
Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation [20.43657369407846]
本研究では,トレーニングシミュレータとテスト環境間のモデルミスマッチに対して頑健な評価政策を決定することを目的として,ロバスト強化学習(RL)について検討する。本稿では2つの新しい不確実性集合の定式化を提案し,その1つは二重サンプリングに基づくものであり,もう1つは積分確率計量に基づくものである。複数の MuJoCo 環境と実世界の TurtleBot ナビゲーションタスクにおいて,提案した RNAC アプローチによって学習されたポリシーの堅牢性を示す。
論文参考訳（メタデータ） (2023-07-17T22:10:20Z)
Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
PbRL(Preference-based Reinforcement Learning)は、RLエージェントが、軌道上のペアワイドな嗜好に基づくフィードバックを用いてタスクを最適化することを学ぶパラダイムである。本稿では,隠れた報酬関数の正確な学習を可能にする探索軌道を求める理論的報酬非依存PbRLフレームワークを提案する。
論文参考訳（メタデータ） (2023-05-29T15:00:09Z)
Stochastic Unrolled Federated Learning [85.6993263983062]
本稿では,UnRolled Federated Learning (SURF)を導入する。提案手法は,この拡張における2つの課題,すなわち,非学習者へのデータセット全体の供給の必要性と,フェデレート学習の分散的性質に対処する。
論文参考訳（メタデータ） (2023-05-24T17:26:22Z)
Single-Trajectory Distributionally Robust Reinforcement Learning [21.955807398493334]
本研究では,分散ロバストRL (DRRL) を提案する。既存のDRRLアルゴリズムはモデルベースか、1つのサンプル軌道から学習できないかのいずれかである。単一軌道を用いた分散ロバストQ-ラーニング(DRQ)と呼ばれる,完全モデルフリーなDRRLアルゴリズムを設計する。
論文参考訳（メタデータ） (2023-01-27T14:08:09Z)
A Unified Convergence Theorem for Stochastic Optimization Methods [4.94128206910124]
一連の統一最適化手法に対する収束結果の導出に使用される基本的な統一収束定理を提供する。直接応用として、一般的な設定下での収束結果をほぼ確実に回復する。
論文参考訳（メタデータ） (2022-06-08T14:01:42Z)
A Stochastic Bundle Method for Interpolating Networks [18.313879914379008]
本稿では,実験的な損失をゼロにすることができるディープニューラルネットワークのトレーニング手法を提案する。各イテレーションにおいて,本手法は目的学習近似のバンドルとして知られる最大線形近似を構成する。
論文参考訳（メタデータ） (2022-01-29T23:02:30Z)
Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration [143.43658264904863]
我々は、最小二乗値スタイルのアルゴリズムで一般的に使用される、より標準的なベルマン誤差の概念の下での反復が、ほぼ最適値関数の学習において強力なPAC保証を提供することを示す。そこで本稿では,任意の(線形な)報酬関数に対して,最適に近いポリシーを学習するためにどのように使用できるかを示す。
論文参考訳（メタデータ） (2020-08-18T04:34:21Z)
A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms [67.67377846416106]
定常ステップサイズに対する強化学習アルゴリズムの理論解析に対する分布的アプローチを提案する。本稿では,TD($lambda$)や$Q$-Learningのような値ベースの手法が,関数の分布空間で制約のある更新ルールを持つことを示す。
論文参考訳（メタデータ） (2020-03-27T05:13:29Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。