Fugu-MT 論文翻訳(概要): Online KL-Regularized Reinforcement Learning with Function Approximation under Misspecification

論文の概要: Online KL-Regularized Reinforcement Learning with Function Approximation under Misspecification

arxiv url: http://arxiv.org/abs/2606.06053v1
Date: Thu, 04 Jun 2026 11:54:23 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-05 22:39:44.766099
Title: Online KL-Regularized Reinforcement Learning with Function Approximation under Misspecification
Title（参考訳）: 関数近似を用いたオンラインKL正規化強化学習
Authors: Haoyang Hong, Zichen Wang, Quanquan Gu, Huazheng Wang,
Abstract要約: KL-regularized contextual bandits and episodic reinforcement learning (RL) under general function approximation with model misspecification。既存の保証は実現可能性に依存しており、従って古典的後悔境界が失敗する可能性のある不特定モデルにまで拡張しない。本研究は、文脈的包帯と韻律的RLに対するKLの不特定式を導入し、Gibsポリシー更新を用いて回帰に基づくアルゴリズムを解析する。
参考スコア（独自算出の注目度）: 70.9534986000242
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study KL-regularized contextual bandits and episodic reinforcement learning (RL) under general function approximation with model misspecification. Existing guarantees rely on realizability and therefore do not extend to misspecified models, where classical regret bounds may fail. This work introduces KL misspecification formulations for contextual bandits and episodic RL and analyzes regression-based algorithms with Gibbs policy updates. High-probability KL-regret guarantees with explicit misspecification terms are established, recovering the standard realizable KL-regularized setting as a special case.
Abstract（参考訳）: KL-regularized contextual bandits and episodic reinforcement learning (RL) under general function approximation with model misspecification。既存の保証は実現可能性に依存しており、従って古典的後悔境界が失敗する可能性のある不特定モデルにまで拡張しない。本研究は、文脈的包帯と韻律的RLに対するKLの不特定式を導入し、Gibsポリシー更新を用いて回帰に基づくアルゴリズムを解析する。高確率なKL-regret保証を明示的な不特定項で確立し、特殊なケースとして標準実現可能なKL-regularizedセッティングを回復する。

論文の概要: Online KL-Regularized Reinforcement Learning with Function Approximation under Misspecification

関連論文リスト