Fugu-MT 論文翻訳(概要): An Information-Theoretic Analysis of Bayesian Reinforcement Learning

論文の概要: An Information-Theoretic Analysis of Bayesian Reinforcement Learning

arxiv url: http://arxiv.org/abs/2207.08735v1
Date: Mon, 18 Jul 2022 16:28:01 GMT
ステータス: 翻訳完了
システム内更新日: 2022-07-19 16:08:57.142917
Title: An Information-Theoretic Analysis of Bayesian Reinforcement Learning
Title（参考訳）: ベイズ強化学習の情報理論解析
Authors: Amaury Gouverneur, Borja Rodr\'iguez-G\'alvez, Tobias J. Oechtering, and Mikael Skoglund
Abstract要約: この定義を,カーネルパラメータが不明なマルコフ決定過程(MDP)としてモデル化した強化学習問題に特化させる。我々の境界は、Russo と Van Roy による現在の情報理論境界の下から回復できることを示す。
参考スコア（独自算出の注目度）: 44.025369660607645
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Building on the framework introduced by Xu and Raginksy [1] for supervised learning problems, we study the best achievable performance for model-based Bayesian reinforcement learning problems. With this purpose, we define minimum Bayesian regret (MBR) as the difference between the maximum expected cumulative reward obtainable either by learning from the collected data or by knowing the environment and its dynamics. We specialize this definition to reinforcement learning problems modeled as Markov decision processes (MDPs) whose kernel parameters are unknown to the agent and whose uncertainty is expressed by a prior distribution. One method for deriving upper bounds on the MBR is presented and specific bounds based on the relative entropy and the Wasserstein distance are given. We then focus on two particular cases of MDPs, the multi-armed bandit problem (MAB) and the online optimization with partial feedback problem. For the latter problem, we show that our bounds can recover from below the current information-theoretic bounds by Russo and Van Roy [2].
Abstract（参考訳）: Xu と Raginksy [1] が導入した教師あり学習問題に基づくフレームワークを構築し,モデルに基づくベイズ強化学習問題に対して最も達成可能な性能について検討した。この目的により、最小ベイズ後悔(MBR)を、収集されたデータから学習するか、環境とその力学を知ることによって得られる最大累積報酬の差として定義する。我々はこの定義を,カーネルパラメータがエージェントに未知であり,不確実性が先行分布によって表現されるマルコフ決定過程(MDP)としてモデル化された強化学習問題に特化する。 MBR上の上界を導出する1つの方法が提示され、相対エントロピーとワッサーシュタイン距離に基づく特定の境界が与えられる。次に,MAB(Multi-armed bandit problem)と部分フィードバック問題を伴うオンライン最適化の2つの事例に着目した。後者の問題に対しては、Russo および Van Roy [2] による現在の情報理論境界の下から、我々の境界が回復できることが示される。

関連論文リスト

On Pareto Optimality for the Multinomial Logistic Bandit [0.0]
マルチノードロジット帯域問題に対処するための新しいオンライン学習アルゴリズムを提供する。 MNLモデルがもたらす課題にもかかわらず、我々は新しいアッパー信頼境界法(UCB)を開発した。我々は,MNL-Bandit問題に対する後悔と推定誤差のトレードオフを特徴付ける理論的保証を開発する。
論文参考訳（メタデータ） (2025-01-31T16:42:29Z)
Learning Algorithms for Verification of Markov Decision Processes [20.5951492453299]
マルコフ決定過程(MDP)の検証に学習アルゴリズムを適用するための一般的な枠組みを提案する。提案するフレームワークは,検証における中核的な問題である確率的到達性に重点を置いている。
論文参考訳（メタデータ） (2024-03-14T08:54:19Z)
STEERING: Stein Information Directed Exploration for Model-Based Reinforcement Learning [111.75423966239092]
遷移モデルの現在の推定値と未知の最適値との間の積分確率距離(IPM)の観点から探索インセンティブを提案する。 KSDに基づく新しいアルゴリズムを開発した。 textbfSTEin information dirtextbfEcted Explor for model-based textbfReinforcement Learntextbfing。
論文参考訳（メタデータ） (2023-01-28T00:49:28Z)
Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes [93.61202366677526]
未測定の共同設立者を対象にオフライン強化学習(RL)について検討した。そこで本稿では, 最適クラスポリシーを見つけるための, 有限サンプルの準最適性を保証した多種多様なポリシー学習手法を提案する。
論文参考訳（メタデータ） (2022-09-18T22:03:55Z)
Reinforcement Learning with a Terminator [80.34572413850186]
我々は, TerMDP のパラメータを学習し, 推定問題の構造を活用し, 状態ワイドな信頼境界を提供する。我々はこれらを用いて証明可能な効率のよいアルゴリズムを構築し、終端を考慮し、その後悔を抑える。
論文参考訳（メタデータ） (2022-05-30T18:40:28Z)
Regret Analysis in Deterministic Reinforcement Learning [78.31410227443102]
本稿では,最適学習アルゴリズムの分析と設計の中心となる後悔の問題を考察する。本稿では,システムパラメータに明示的に依存する対数問題固有の後悔の下位境界について述べる。
論文参考訳（メタデータ） (2021-06-27T23:41:57Z)
Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning [52.74071439183113]
我々は、強化学習を通して解決された逐次決定問題(MDP)の文脈における予測列最適化フレームワークについて検討した。 2つの重要な計算課題は、意思決定中心の学習をMDPに適用することである。
論文参考訳（メタデータ） (2021-06-06T23:53:31Z)
Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions [24.389388509299543]
Implicit Maximum Likelihood Estimationは、離散指数族分布と微分可能なニューラル成分を組み合わせたモデルのエンドツーエンド学習のためのフレームワークである。 I-MLEは、問題固有の緩和に依存する既存のアプローチよりも優れており、しばしば優れていることを示す。
論文参考訳（メタデータ） (2021-06-03T12:42:21Z)
Parameterized MDPs and Reinforcement Learning Problems -- A Maximum Entropy Principle Based Framework [2.741266294612776]
逐次的意思決定問題に対処する枠組みを提案する。我々のフレームワークは、ノイズの多いデータに対する堅牢性を備えた最適制御ポリシーの学習を特徴としている。
論文参考訳（メタデータ） (2020-06-17T04:08:35Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。