Fugu-MT 論文翻訳(概要): List Replicable Reinforcement Learning

論文の概要: List Replicable Reinforcement Learning

arxiv url: http://arxiv.org/abs/2512.00553v1
Date: Sat, 29 Nov 2025 16:47:43 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-02 19:46:34.292665
Title: List Replicable Reinforcement Learning
Title（参考訳）: List Replicable Reinforcement Learning
Authors: Bohan Zhang, Michael Chen, A. Pavan, N. V. Vinodchandran, Lin F. Yang, Ruosong Wang,
Abstract要約: Probably A roughly correct (PAC) RL frameworkにおけるエフェリストの再現性について検討した。弱いリストと強いリストの複製性の両方を導入します。我々は,新たな計画戦略を実践的なRLフレームワークに組み込むことで,その安定性を向上できることを示す。
参考スコア（独自算出の注目度）: 23.401442101618215
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Replicability is a fundamental challenge in reinforcement learning (RL), as RL algorithms are empirically observed to be unstable and sensitive to variations in training conditions. To formally address this issue, we study \emph{list replicability} in the Probably Approximately Correct (PAC) RL framework, where an algorithm must return a near-optimal policy that lies in a \emph{small list} of policies across different runs, with high probability. The size of this list defines the \emph{list complexity}. We introduce both weak and strong forms of list replicability: the weak form ensures that the final learned policy belongs to a small list, while the strong form further requires that the entire sequence of executed policies remains constrained. These objectives are challenging, as existing RL algorithms exhibit exponential list complexity due to their instability. Our main theoretical contribution is a provably efficient tabular RL algorithm that guarantees list replicability by ensuring the list complexity remains polynomial in the number of states, actions, and the horizon length. We further extend our techniques to achieve strong list replicability, bounding the number of possible policy execution traces polynomially with high probability. Our theoretical result is made possible by key innovations including (i) a novel planning strategy that selects actions based on lexicographic order among near-optimal choices within a randomly chosen tolerance threshold, and (ii) a mechanism for testing state reachability in stochastic environments while preserving replicability. Finally, we demonstrate that our theoretical investigation sheds light on resolving the \emph{instability} issue of RL algorithms used in practice. In particular, we show that empirically, our new planning strategy can be incorporated into practical RL frameworks to enhance their stability.
Abstract（参考訳）: 再現性は強化学習(RL)の基本的な課題であり、RLアルゴリズムは不安定で訓練条件の変動に敏感である。この問題を正式に解決するために、確率的近似(PAC) RL フレームワークで \emph{list replicability} を検証し、アルゴリズムは異なるランにまたがるポリシーの \emph{small list} に含まれる準最適ポリシーを高い確率で返さなければならない。このリストのサイズは \emph{list complexity} を定義する。弱い形式は、最終的な学習されたポリシーが小さなリストに属することを保証するが、強い形式は、実行されたポリシーの全順序が制約されたままであることを要求する。これらの目的は、既存のRLアルゴリズムが不安定性のために指数関数的なリストの複雑さを示すため、困難である。我々の主な理論的貢献は、リストの複雑さが状態数、動作数、地平線長の多項式であることを保証し、リストの複製性を確実に保証する証明可能な表RLアルゴリズムである。さらに我々の手法を拡張して、高い確率で多項式的にポリシー実行トレースの数を制限し、強力なリストの複製性を実現する。私たちの理論的結果は、重要なイノベーションによって実現されます。一ランダムに選択された許容閾値内において、最適に近い選択のうち、語彙順に基づいて行動を選択する新しい計画戦略二複製性を維持しつつ、確率的環境における状態到達性をテストするためのメカニズム。最後に、我々の理論的研究は、実際に使用されるRLアルゴリズムの「emph{instability}」問題の解決に光を当てていることを実証する。特に,我々の新たな計画戦略を実践的なRLフレームワークに組み込むことによって,その安定性を向上させることを実証的に示す。

論文の概要: List Replicable Reinforcement Learning

関連論文リスト