Fugu-MT 論文翻訳(概要): Near-continuous time Reinforcement Learning for continuous state-action spaces

論文の概要: Near-continuous time Reinforcement Learning for continuous state-action spaces

arxiv url: http://arxiv.org/abs/2309.02815v1
Date: Wed, 6 Sep 2023 08:01:17 GMT
ステータス: 翻訳完了
システム内更新日: 2023-09-07 16:14:42.259901
Title: Near-continuous time Reinforcement Learning for continuous state-action spaces
Title（参考訳）: 連続状態-作用空間に対する近連続時間強化学習
Authors: Lorenzo Croissant (CEREMADE), Marc Abeille, Bruno Bouchard (CEREMADE)
Abstract要約: 本研究では,未知の力学系を制御することによる強化学習の問題点を考察し,一つの軌道に沿った長期平均報酬を最大化する。文献の多くは、離散時間と離散状態-作用空間で発生するシステム相互作用を考察している。本稿では,サブタスク(学習と計画)を効果的に行うことができれば,有望なオプティミズムプロトコルが適用可能であることを示す。
参考スコア（独自算出の注目度）: 3.5527561584422456
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the Reinforcement Learning problem of controlling an unknown dynamical system to maximise the long-term average reward along a single trajectory. Most of the literature considers system interactions that occur in discrete time and discrete state-action spaces. Although this standpoint is suitable for games, it is often inadequate for mechanical or digital systems in which interactions occur at a high frequency, if not in continuous time, and whose state spaces are large if not inherently continuous. Perhaps the only exception is the Linear Quadratic framework for which results exist both in discrete and continuous time. However, its ability to handle continuous states comes with the drawback of a rigid dynamic and reward structure. This work aims to overcome these shortcomings by modelling interaction times with a Poisson clock of frequency $\varepsilon^{-1}$, which captures arbitrary time scales: from discrete ($\varepsilon=1$) to continuous time ($\varepsilon\downarrow0$). In addition, we consider a generic reward function and model the state dynamics according to a jump process with an arbitrary transition kernel on $\mathbb{R}^d$. We show that the celebrated optimism protocol applies when the sub-tasks (learning and planning) can be performed effectively. We tackle learning within the eluder dimension framework and propose an approximate planning method based on a diffusive limit approximation of the jump process. Overall, our algorithm enjoys a regret of order $\tilde{\mathcal{O}}(\varepsilon^{1/2} T+\sqrt{T})$. As the frequency of interactions blows up, the approximation error $\varepsilon^{1/2} T$ vanishes, showing that $\tilde{\mathcal{O}}(\sqrt{T})$ is attainable in near-continuous time.
Abstract（参考訳）: 未知の力学系を制御するための強化学習問題を考えることで,1つの軌道に沿った長期平均報酬を最大化する。文献の多くは、離散時間と離散状態-作用空間で発生するシステム相互作用を考察している。この立場はゲームに適しているが、連続時間でなければ高い頻度で相互作用が起こり、本質的に連続でなければ状態空間が大きいメカニカルまたはデジタルシステムでは不十分であることが多い。おそらく唯一の例外は、離散時間と連続時間の両方に結果が存在する線形二次フレームワークである。しかし、連続状態を扱う能力は、強固な動的および報酬構造の欠点をもたらす。この研究は、離散時間 (\varepsilon=1$) から連続時間 (\varepsilon\downarrow0$) までの任意の時間スケールをキャプチャするpoissonクロック $\varepsilon^{-1}$ で相互作用時間をモデル化することで、これらの欠点を克服することを目的としている。さらに、一般的な報酬関数を検討し、$\mathbb{r}^d$ 上の任意の遷移核を持つジャンププロセスに従って状態ダイナミクスをモデル化する。提案手法は,サブタスク(学習と計画)が効果的に実行される場合に有効であることを示す。我々は,エリューダー次元の枠組み内での学習に取り組み,ジャンプ過程の拡散極限近似に基づく近似計画法を提案する。全体として、我々のアルゴリズムは次数 $\tilde{\mathcal{O}}(\varepsilon^{1/2} T+\sqrt{T})$ を後悔している。相互作用の頻度が爆発すると、近似誤差$\varepsilon^{1/2} T$は消え、$\tilde{\mathcal{O}}(\sqrt{T})$がほぼ連続時間で達成可能であることを示す。

関連論文リスト

Superspin Renormalization and Slow Relaxation in Random Spin Systems [0.0]
我々は、ランダムに相互作用するスピン-$frac12$系における保存密度のダイナミクスを記述するために、励起状態実空間再正規化群(RSRG-X)を開発した。我々の定式化は$textrmU(1)$および$mathbbZ$対称性を持つ系に適しており、双極子$XX+YY$相互作用を持つランダムに位置付けられたスピンの連鎖に適用する。
論文参考訳（メタデータ） (2025-02-13T18:59:03Z)
Dynamically emergent correlations in bosons via quantum resetting [0.0]
調和トラップ中のN$非相互作用ボソン系の量子リセットにより誘導される非平衡定常状態(NESS)について検討する。我々は, 平均密度, 極値統計, 秩序, ギャップ統計などの物理観測値を解析的に計算することによって, 定常状態を完全に特徴づける。これは、様々な観測可能なものを正確に計算できる強い相関の量子多体NESSの稀な例である。
論文参考訳（メタデータ） (2024-07-29T18:00:35Z)
Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
本稿では,円滑なベルマン作用素を持つ連続空間マルコフ決定過程(MDP)の一般クラスにおいて,$varepsilon$-optimal Policyを学習する問題を考察する。我々のソリューションの鍵となるのは、調和解析のアイデアに基づく新しい射影技術である。我々の結果は、連続空間 MDP における2つの人気と矛盾する視点のギャップを埋めるものである。
論文参考訳（メタデータ） (2024-05-10T09:58:47Z)
Integrable Digital Quantum Simulation: Generalized Gibbs Ensembles and Trotter Transitions [0.0]
XXZハイゼンベルクスピン鎖におけるスピン波状態からのクエンチについて検討した。正確な計算により、一般化ギブズ・アンサンブルがトロッターステップに解析的に依存していることが分かる。非零段磁化の出現と関連するため,後者は局所的に検出可能であることを示す。
論文参考訳（メタデータ） (2022-12-13T09:54:56Z)
Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning [77.22019100456595]
通信周波数の異なる分散計算作業者のトレーニングアルゴリズムを示す。本研究では,より厳密な収束率を$mathcalO!!(sigma2-2_avg!)とする。また,不均一性の項は,作業者の平均遅延によっても影響されることを示した。
論文参考訳（メタデータ） (2022-06-16T17:10:57Z)
Hessian Averaging in Stochastic Newton Methods Achieves Superlinear Convergence [69.65563161962245]
ニュートン法を用いて,滑らかで強凸な目的関数を考える。最適段階において局所収束に遷移する普遍重み付き平均化スキームが存在することを示す。
論文参考訳（メタデータ） (2022-04-20T07:14:21Z)
Random quantum circuits transform local noise into global white noise [118.18170052022323]
低忠実度状態におけるノイズランダム量子回路の測定結果の分布について検討する。十分に弱くユニタリな局所雑音に対して、一般的なノイズ回路インスタンスの出力分布$p_textnoisy$間の相関(線形クロスエントロピーベンチマークで測定)は指数関数的に減少する。ノイズが不整合であれば、出力分布は、正確に同じ速度で均一分布の$p_textunif$に近づく。
論文参考訳（メタデータ） (2021-11-29T19:26:28Z)
The connection between time-local and time-nonlocal perturbation expansions [0.0]
カーネル $mathcalK$ の級数は、より複雑な生成元 $mathcalG$ の対応する級数に直接変換されることを示す。単一不純物アンダーソンモデルに対して$mathcalK$および$mathcalG$のリードおよび次から次への順序計算について説明する。
論文参考訳（メタデータ） (2021-07-19T15:05:29Z)
Route to Extend the Lifetime of a Discrete Time Crystal in a Finite Spin Chain Without Disorder [0.0]
周期駆動系は、離散時間変換対称性を持つ時間依存ハミルトニアンによって記述される。この対称性の自発的な破れは、新しい物質の非平衡相、離散時間結晶(DTC)の出現につながる
論文参考訳（メタデータ） (2021-04-12T04:45:09Z)
The Connection between Discrete- and Continuous-Time Descriptions of Gaussian Continuous Processes [60.35125735474386]
我々は、一貫した推定子をもたらす離散化が粗粒化下での不変性を持つことを示す。この結果は、導関数再構成のための微分スキームと局所時間推論アプローチの組み合わせが、2次または高次微分方程式の時系列解析に役立たない理由を説明する。
論文参考訳（メタデータ） (2021-01-16T17:11:02Z)
Interpolated Collision Model Formalism [0.0]
任意の衝突モデルによって与えられる離散時間力学から連続時間マスター方程式を構築するための新しい手法について論じる。連続極限に基づくアプローチは、何らかの方法で微調整されない限り、常にユニタリダイナミクスが得られることを示す。
論文参考訳（メタデータ） (2020-09-22T11:50:14Z)
Frequentist Regret Bounds for Randomized Least-Squares Value Iteration [94.47472987987805]
有限水平強化学習(RL)における探索・探索ジレンマの検討本稿では,ランダム化最小二乗値 (RLSVI) の楽観的な変種を紹介する。マルコフ決定過程が低ランク遷移ダイナミクスを持つという仮定の下で、RSVIの頻繁な後悔は、$widetilde O(d2 H2 sqrtT)$$ d $ が特徴次元であり、$ H $ が地平線であり、$ T $ が総数であることを示す。
論文参考訳（メタデータ） (2019-11-01T19:48:57Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。